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POLYNUCLEOTIDES CONTROLLING THE EXPRESSION OF AND CODING FOR GENE B IN TOMATO 

5 FIELD AND BACKGROUTs^D OF THE INVENTION 

The present inveniion relates to a novel polynucleotide sequences 
isolated from tomato and, more particularly, to a novel lycopene cyclase 
gene and novel control elements controlling its specific expression in 
chromogenic tissues of plants, e.g., fruit and flower. 

10 Carotenoids - functions and biosynthesis: Carotenoids comprise 

one of the largest classes of pigments in nature. In pholosynthetic 
organisrns carotenoids serve two major functions - as accessory pigments 
for light harvesting, and as protective agents against photooxidation 
processes in the photosynthetic apparatus. Another important role of . 

15 carotenoids in plants, as well as in some animals, is that of providing 
distinctive pigmentation. Most of the orange, yellow, or red colors found in 
the flower$, frUats and other organs of rhany higher plant species are due to 
accumulation of carotenoids in the cells. 

The biosynthesis of carotenoids has been reviewed extensively 

20 (Britton, 1988; Sandmann, 1994a). Carotenoids are produced from the 
general isoprenoid biosynthetic pathway, which in plants takes place in the 
chloroplasts of photosynthetic tissues and chromoplasts of fruits and 
flowers. 

The first unique step in carotenoid biosynthesis is the head-to-head 
25 condensati6n of two molecules of geranylgeranyl pyrophosphate (GGPP) to 
produce phytoene (Figure 1). All the subsequent steps in the pathway occur 
in association with membranes. Four desaruration (dehydrogenation) 
reactions convert phytoene to lycopene via phytofluene, C-carotene, and 
neurosporene, as intermediates. Two cyclization reactions convert lycopene 
30 to p-carotene (Figure 1). Further reactions involve the addition of various 
oxygen-containing side groups which form the various xanthophyll species 
(not shown). 

It has been established in recent years that four enzymes in plants 
catalyze the biosynthesis of p-carolene from GGPP: phytoene synthase, 
35 phytoene desaturase, C-carotene desaturase and lycopene cyclase (reviewed 
in Sandmann, 1994b). All enzymes in the pathway are nuclear encoded. 
Genes for phytoene synthase and phytoene desaturase have been previously 
cloned from tomato (Ray et al., 1992; Pecker et al., 1992). 
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The red color of ripe tomatoes is provided by lycopcne, a linear 
carotene which accumulates during fruit ripening as membrane-bound- 
crystals in chromoplasts (Laval-Martin et aL, 1975). It is presumed to serve ^ 
as an attractant of predators that eat the fruit and disperse the seeds. 
Accumulation of lycopene begins at the "breaker" stage of fruit ripening 
after the fruit has reached the "mature green" stage. In the "breaker" stage, 
u'hich is indicated by the commencement of color change from green to 
orange, chlorophyll is degraded and chloroplasts turn into chromoplasts 
(Gillaspy et al., 1993; Grierson and Schuch, 1993). Total carotenoid 
concentration increases between 10 to 15-fold during the transition from 
"mature green" to "red". This change is due mainly to a 300-fold increase 
in lycopene (Fraser et al., 1994). 

The cDNA which encodes lycopene p-cyclase, CrtL-h, was cloned 
from tomato {Lycopersicon esculenium cv. VF36) and tobacco (Nicotiana 
tabacum cv. Samsun NN, Pecker et al.. 1996, U.S. Pat. application No. 
08/399,561 and PCT/US96/03044 (WO 96/28014) both are incorporated by 
reference as if fully set forth herein) and was functionally expressed in 
Escherichia coli. This enzyme converts lycopene to p-carotene by 
catalyzing the formation of two p-rings, one at each end of the linear 
carotene. The enzyme interacts with half of the carotenoid molecule and 
requires a double bond at the C-7,8 (or C-7.8') position. Inhibition 
experiments in E. coli indicated that lycopene cyclase is the target site for 
the inhibitor 2-(4-methylphenoxy)tri-ethylamine hydrochloride (MPTA, 
Pecker et al., 1996). The primary structure of lycopene cyclase in higher 
plants is significantly conserved with the enzyme from cyanobacteria but 
differs from that of the non-photosyntheiic bacteria Erwinia (Pecker et al., 
1996). Levels of mRNAs of CrtL-b and Pds, which encodes phytoene 
desaturase, were measured in leaves, flowers and ripening fruits of tomato. 
In contrast to genes that encode enzymes of early steps in the carotenoid 
biosynthesis pathway, whose transcription increases during the "breaker" 
stage of fruit ripening, the level of CrtL-b mRNA decreases at this stage 
(Pecker et al., 1996), Hence, the accumulation of lycopene in tomato fruits 
is apparently due to a down-regulation of the lycopene cyclase gene that 
occurs at the breaker stage of fruit development. This conclusion supports 
the hypothesis that transcriptional regulation of gene expression is a 
predominant mechanism of regulating carotenogenesis. 

The search for tissue specific control elements in plants is on going, 
however, only limited number of tissue specific control elements capable of 
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• specifically direciing gene expression in chromogenic tissues (frufl, flower) 
have so far been isolated. These include the promoters of the genes E4 and 
E8 (Montgomepy' ei al., 1993), which are up-regulated by increase in * 
ethylene concentration during tomato fruit ripening, the tomato gene 2A1 1 
5 gene (Van Haaren and Houck, 1991) and the polygalacturonase (PG) gene 
(Nicholass et al., 1995; Montgomery' et a)., 1993), which are upregulated in 
tomato fruits during ripening. * 

There is thus a widely recognized need for, and it would be highly 
advantageous to have, a novel tissue specific control elements capable of 
10 specifically directing gene expression in chromogenic tissues. 

The search for structural genes encoding enzymes associated with 
carotenogenesis is ongoing, and ever>' new gene isolated not only provides 
insight into carotenogenesis, but also provides a tool to control and modify 
carotenogenesis for commercial purposes (Hirschberg et al. 1997, 
1 5 Cunningham FX Jr. and Gantt B, 1998). 

' There is thus a widely recognized need for, and it would be highly 
advantageous to have, a novel lycopene cyclase capable of altering the 
composition of carotenoids in carotenoids producing organisms. 

20 SU MMARY O F THE I NVENTION 

According to one aspect of the present invention there is provided an 
isolated complementary or genomic DNA segment comprising a nucleotide 
sequence coding for a polypeptide having an amino acid sequence selected 
from the group consisting of SEQ ID NOs: 17, 18 and 19 and functional 
25 naturally occurring and man-induced variants thereof, with the provision 
that the polypeptide has a major lycopene cyclase catalytic activity. 

According to further features in preferred embodiments of the 
invention described below, the nucleotide sequence is selected from the 
group consisting of SEQ ID NOs: 8, 9, 10 and 1 1 and ftjnctional naturally 
30 occurring and man>induced variants thereof. 

According to still further features in the described preferred 
embodiments the nucleotide sequence is a cDNA or a genomic DNA 
isolated form tomato. 

According to another aspect of the present invention there is 
35 provided a polypeptide comprising an amino acid sequence selected from 
the group consisting of SEQ ID NOs: 1 7, 1 8 and 19 and functional naturally 
occurring and man-induced variants thereof, the polypeptide having a major 
lycopene cyclase catalytic activity. 
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According to another aspect of the present invention there is 
provided a transduced cell overexpressing a polypeptide including an amino 
acid sequence selected from the group consisting of SEQ ID NOs: 17, 18 
and 19 and functional naturally occurring and man-induced variants thereof, 
5 the polypeptide having a major lycopene cyclase catalytic activity, the cell 
therefore over producing p-carotene on an expense of lycopene. 

According to still further features in the described preferred 
embodiments the transduced cell is selected from the group consisting of a 
prokaryotic cell and a eukaryotic cell. 
10 According to still further features in the described preferred 

embodiments the eukaryotic cell is of a higher plant. 

According to still further features in the described preferred 
embodiments the cell forms a part of a transgenic plant. 

According to yet another aspect of the present invention there is 
15 provided a method of dou^n-regulating production of ^-carotene in a cell 
comprising the step of introducing into the cell at least one anti-sense 
polynucleotide sequence capable of base pairing with messenger RNA 
coding for a polypeptide including an amino acid sequence selected from 
the group consisting of SEQ ID NOs: 17, 18 and 19 and functional naturally 
20 occurring and man-induced variants thereof, the polypeptide having a major 
lycopene cyclase catalytic activity, the cell therefore under producing P- 
carotene from lycopene. 

According to still further features in the described preferred 
embodiments the at least one anti-sense polynucleotide sequence includes a 
25 synthetic oligonucleotide. 

According to still further features in the described preferred 
embodiments the synthetic oligonucleotide includes a man-made 
modification rendering the synthetic oligonucleotide more stable in cell 
environment, 

30 According to still further features in the described preferred 

embodiments the synthetic oligonucleotide is selected from the group 
consisting of methylphosphonate oligonucleotide, monothiophosphate 
oligonucleotide, dithiophosphate oligonucleotide, phosphoramidate 
oligonucleotide, phosphate ester oligonucleotide, bridged phosphorothioate 

35 oligonucleotide, bridged phosphoramidate oligonucleotide, bridged 
methylenephosphonate oligonucleotide, dephospho intemucleotide analogs 
with siloxane bridges, carbonate bridge oligonucleotide, carboxymethyl 
ester bridge oligonucleotide, carbonate bridge oligonucleotide, 
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. carboxymethyl ester bridge oligonucleotide, aceiamidc* bridge 
oligonucleotide, carbamate bridge oligonucleotide, thioether bridge 
oligonucleotide, sulfoxy bridge oligonucleotide, sulfono bridge * 
oligonucleotide and a-anomeric bridge oligonucleotide. 
5 According to still further features in the described preferred 

embodiments the at least one anti-sense polynucleotide sequence is encoded 
by an expression vector. 

According to still further features in the described preferred 
embodiments the cell is selected from the group consisting of a prokaryotic 
10 cell and a eukaryotic cell. 

According to still further features in the described preferred 
_embodiments the eukar>'Otic cell js of a higher plant- 
According to still further features in the described preferred 
embodiments the cell forms a part of a transgenic plant. 
15 According to still another aspect of the present invention there is 

provided an expression construct for directing an expression of a gene in 
fruit or flov/er comprising a regulatory sequence selected from the group 
consisting of an upstream region of a B allele of tomato and an upstream 
region of a b allele of tomato. 
20 According to still further features in the described preferred 

embodiments the expression construct comprising a functional part of 
nucleotides 1-1210 of SEQ ID NO: 14 or nucleotides 1-1600 of SEQ ID 
NO: 15, or functional naturally occurring and man-induced variants thereof. 
According to still further features in the described preferred 
25 embodiments the expression construct comprising at least one control 
element having a sequence selected from the group consisting of SEQ ID 
NOs:21-24, all derived from SEQ ID NO:ll, and functional naturally 
occurring and man-induced variants thereof. 

According to still further features in the described preferred 
30 embodiments the expression construct is selected from the group consisting 
of plasmid, cosmid, phage, virus, bacmid and artificial chromosome. 

According to still further features in the described preferred 
embodiments the expression construct is designed to integrate into a 
genome of a host. 

35 According to yet another aspect of the present invention there is 

provided a transduced cell or transgenic plant transduced with the above 
described expression construct. 
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According lo still another aspect of the present invention there is 
provided a method of isolating a gene encoding a polypeptide having an 
amino acid sequence homologous to SEQ ID NOs: 17, 18 and 19 and 
having a major lycopene cyclase catalytic activity from a species, the 
5 method comprising the step of screening a complementary or genomic DNA 
librar>' prepared from isolated RNA or genomic DNA extracted from the 
species with a probe havmg a sequence derived from SEQ ID NOs: 8, 9, 10 
or 1 1 and isolating clones reacting with the probe. 

The present invention successfully addresses the shortcomings of the 
10 preseritly *known configurations by providing novel polynucleotides 
controlling the expression of genes in fruit and flower in plant and a novel 
polynucleotide encoding lycopene cyclase. 

RRIRF DESCRIPTION OF THE DRAWINGS 

15 The invention herein described, by way of example only, with 

reference to the accompanying drawings, wherein: 

FIG. 1 presents the pathway of carotenoid biosynthesis in plants and 
algae. Enzymes are indicated by the their gene assignment symbols: aba2, 
zeaxanthin epoxidase; CrtL-b^ Lycopene p-cyclasc; CrtL-e^ lycopene e- 

20 cyclase; CrtR-b, (3-ring hydroxylase; CrtR-e, e-ring hydroxylase; Pds, 
phytoene desaturase {crtP in cyanobacteria); Psy, phytoene synthase {crtB 
in cyanobacteria); Zds, C-carotene desaturase {crtQ) in cyanobacteria. 
GGDP, geranylgeranyl diphosphate. 

FIG. 2 shows fine genetic mapping and molecular organization of B 

25 on chromosome 6 of the tomato linkage map. The linkage map was adopted 
from Eshed and Zamir (1995). The relevant chromosomal segments from 
L. pennellii that were introgressed to L. esculenium lines IL 6-2 and IL 6-3 
are represented by black bars. High-resolution genetic map around B is 
displayed with genetic distances in map units (cM). Positions of the YAC 

30 inserts are designated under the map. 

FIG. 3 demonstrates levels of mRNA (relative units) during fruit 
ripening of wild-type tomato L, esculenium . Data are derived from 
quantifying the DNA products in the RT-PCR analysis of total RNA 
extracted at different stages of fruit development. Ripening stages: IG, 

35 immature green; MG, mature green, B, breaker, O. Orange; P, pink; R. red. 

FIG. 4 demonstrates levels of mRNA (relative units) during fruit 
ripening of the tomato mutant High-beta, Data are derived from 
quantifying the DNA products in the RT-PCR analysis of total RNA 
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extracted at different stages of fruit development. Ripening stages: G, green; 
MG5 mature green, B, breaker, O, Orange; P, pink; R, red. 

PFf^rRTPTION OF THE PRFFERRKD FMRODIMFNTS . 
5 The present invention is of novel polynucleotide sequences isolated 

from tomato which can be used to control gene expression in plant 
chromogenic tissues, especially fruit and flower. The present invention is 
further of polynucleotide sequence^ isolated from tomato which encode a 
lycopene cyclase which can be used to alter carotenogenesis is carotenoids 

10 producing organisms. 

- The principles. and operation of the present invention may be better 
understood with reference to the drawings and accompanying descriptions. 

Before explaining at least one embodiment of the invention in detail, 
it is to be understood that the invention is not limited in its application to the 

15 details of construction and the arrangement of the components set forth in 
the following description or illustrated in the drawings. The invention is 
capable of other embodiments or of being practiced or carried out in various 
ways. Also, it is to be understood that the phraseology and terminology 
employed herein is for the purpose of description and should not be 

20 regarded as limiting. 

Fruit of the cultivated tomato {Lycopersicon esculentum) accumulate 
lycopene, a red carotenoid pigment. A dominant allele of gene B 
determines accumulation of P-carotene in the fruits of the tomato mutant 
'high'beta'\ at the expense of lycopene, resulting in a unique orange color. 

25 Conversion of lycopene to p-carotene in the biosynthesis pathway of 
carotenoids is catalyzed by the enzyme lycopene p-cyclase. Previously it 
was shown that CrtL-b, the gene for lycopene p-cyclase, does not map to 
the locus B in the tomato genetic map. This ruled out the possibility that a 
mutation in lycopene p-cyclase encoded by CrtL-b causes the phenotype in 

30 high-beta. 

The locus B was mapped to chromosome No. 6. The dominant allele 
B was found in the tomato introgression line IL 6-2. The DNA of B was 
identified and cloned by a map-based (positional) cloning method. The 
nucleotide sequence of this gene was determined and demonstrated a novel 
35 type of a lycopene cyclase enzyme. Its primary structure has some 
similarity to other lycopene cyclases and to the enzyme capsanthin- 
capsorubin synthase from pepper. In addition, nucleotide sequence was 
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identified, which funciions as a strong promoter during fruit develapmenl in 
the B allele of the mutant High-beta. 

Thus, according to one aspect of the present invention there is ♦ 
provided an isolated complementary or genomic DNA segment comprising 
5 a nucleotide sequence coding for a polypeptide having an amino acid 
sequence selected from the group consisting of SEQ ID NOs: 17, 18 and 19 
and functional narurally occurring and man-induced variants thereof The . 
polypeptide has a major lycopene cyclase catalytic activity. Polypeptides 
which share at least 70, 75, 80, 85, 90, 95 or more identical amino acid 

10 residues with SEQ ID NOs: 17, 18 or 19 are also within the scope of the 
present invention. 

As used herein in the specification and in the claims section below, 
the phrase "major lycopene cyclase catalytic activity" refers to catalytic 
activity mainly directed at the conversion of lycopene to p-carotene by 

15 catalyzing the formation of two P-rings. one at each end of the linear 
carot(?ne, such that if introduced into lycopene-accumulating E, coli cells, 
such cells accumulate also p-carotene up to a range of at least few percent 
e.g., 5 %, to preferably about 15 %, or more, of total carotenoids therein by 
symmetric formation of two p-ionone rings on the linear lycopene 

20 molecules therein. 

According to a preferred embodiment of the invention the nucleotide 
sequence is as set forth in SEQ ID NOs: 8, 9, 10 or 1 1, or functional 
naturally occurring or man-induced variants thereof. As further shown 
below these sequences are genomic and complementary DNA sequences 

25 which were derived while reducing the present invention to practice from 
certain tomato cultivars or lines. However, nucleotide sequences which 
share 70, 75, 80, 85, 90, 95 or more identical nucleotides with SEQ ID NOs: 
8, 9, 10 or 1 1 are also within the scope of the present invention. 

According to another aspect of the present invention there is 

30 provided a polypeptide comprising an amino acid sequence selected from 
the group consisting of SEQ ID NOs: 17, 18 and 19 and functional naturally 
occurring and man-induced variants thereof, the polypeptide having a major 
lycopene cyclase catalytic activity. Homologous polypeptides as describe 
above and further detailed hereinunder are also envisaged. 

35 According to another aspect of the present invention there is 

provided a transduced cell overexpressing a polypeptide including an amino 
acid sequence selected from the group consisting of SEQ ID NOs: 17, 18 
and 19, and functional naturally occurring and man-induced variants 
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. thereof, the polypeptide having a major lycopene cyclase catalytic activity, 
the cell therefore over producing p-caroiene on an expense of lycopene. 

The cell according to the present invention can be of any type. For * 
example, the cell can be a prokaryotic cell or a eukaryotic cell. Preferably 
5 the cell is of a higher plant. The cell preferably forms a part of a transgenic 
plant. Methods of transducing cells (and cells in organisms to form 
transgenic organisms) are well known in the art and do not require further 
description herein. Protocols are available, for example, in (Sambrook et 
.al.,.1989). 

10 As used herein in the specification and in the claims section below, 

the term "transduced'- refers to the result of a process of inserting nucleic 
acids into cells. The insertion may, for example, be effected by 
transformation, viral infection, injection, transfection, gene bombardment, 
electroporation or any other means effective in introducing nucleic acids 

15 into cells. Following transduction the nucleic acid is either integrated in all 
or part, to the cell's genome (DNA), or remains external to the cell's 
genome, thereby providing stably transduced or transiently transduced cells. 

According to yet another aspect of the present invention there is 
provided a method of down-regulating production of P-carotene in a cell 

20 comprising the step of introducing into the cell at least one anti-sense 
polynucleotide sequence capable of base pairing with messenger RNA 
coding for a polypeptide including an amino acid sequence selected from 
the group consisting of SEQ ID NOs: 17, 18 and 19 and functional naturally 
occurring and man-induced variants thereof, the polypeptide having a major 

25 lycopene cyclase catalytic activity, the cell therefore under producing p- 
carotene from lycopene. Again, the cell can be of any type. For example, 
the cell can be a prokaryotic cell or a eukaryotic cell. Preferably the cell is 
of a higher plant. The cell preferably fomis a part of a transgenic plant. 

As used herein in the specification and in the claims section below, 

30 the term "down regulating" means also reducing, lowering, inhibiting, etc., 
e.g., permanently or transiently reducing. 

As used herein in the specification and in the claims section below, 
the term "production" means also formation or generiation. 

As used herein in the specification and in the claims section below, 

35 the term "introducing" means also providing with or inserting. 

The at least one anti-sense polynucleotide sequence according to the 
present invention can includes one or several synthetic oligonucleotides 
capable of base pairing with messenger RNA derived from the above- 
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ideniified nucleotide sequences. The synthetic oligonucleotide preferably 
includes a man-made modification rendering the synthetic oligonucleotide 
more stable in cell environment. The modified oligonucleotide can be, for 
example, a methylphosphonate oligonucleotide, monothiophosphate 
5 oligonucleotide, dithiophosphate oligonucleotide, phosphoramidate 
oligonucleotide, phosphate ester oligonucleotide, bridged phosphorothioate 
oligonucleotide, bridged phosphoramidate oligonucleotide, bridged 
methylenephosphonate oligonucleotide, dephospho intemucleotide analogs 
with siloxane bridges, carbonate bridge oligonucleotide, carboxymethyl 

10 ester bridge oligonucleotide, carbonate bridge oligonucleotide, 
carboxymethyl ester bridge oligonucleotide, acetamide bridge 
oligonucleotide, carbamate bridge oligonucleotide, thioether bridge 
oligonucleotide, sulfoxy bridge oligonucleotide, sulfono bridge 
oligonucleotide or an a-anomeric bridge oligonucleotide. . For further 

1 5 details the reader is referred to Cook ( 1 99 1 ). 

Alternatively, the anti«sense polynucleotide sequence is encoded by 
an anti-sense expression vector. Such vectors are well known in the art and 
are commercially available from, for example, pBllOl, pB1121, pBI22] 
(commercially available from Colntech.) 

20 Further according to the present invention, there is provided an 

expression construct for directing an expression of a gene in fruit or flower 
of a plant. The expression vector according to the present invention 
includes a regulatory sequence selected from the group consisting of an 
upstream rfegion of a B allele of tomato and an upstream region of a b allele 

25 of tomato. Thus, according to a preferred embodiment of the invention, the 
expression construct includes a functional part of nucleotides 1-1210 of 
SEQ ID NO: 14 or nucleotides 1-1600 of SEQ ID NO: 15, or functional 
naturally occurring and man-induced variants thereof. 

According to a preferred embodiment, the expression construct 

30 includes at least one control element having a sequence selected from the 
group consisting of SEQ ID NOs: 21-24, all derived from SEQ ID NO:l 1, 
and functional naturally occurring and man-induced variants thereof. 

As further detailed in the Examples section hereinbelow, these 
sequence elements, which are 26, 13, 9, and 8 bp long and start at (5' end) 

35 nucleotides 859, 753, 479 and 306, respectively, of SEQ ID NOs: 11, 15, 
are located upstream to the initiator methionine codon in the B allele are the 
main difference between the B and b allele, and are therefore responsible 
for the differential expression of the B locus in tomato. 
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The expression construct according to the present invention can be a 
plasmid, cosmid, phage, virus, bacmid or an artificial chrornosome. Each of 
these constructs has unique sequences rendering the construct most 
applicable for some as opposed to other applications, as well known in the 
art. Regardless of its t\'pe, according to a preferred embodiment of the 
present invention the expression construct is designed to integrate into a 
genome of a host, such that stable transfectants are obtainable. However, 
the scope of the present invention is. not limited to such constructs. In other 
words, constructs designed for transient transfection are also within the 
scope of the present invention. In any case, the construct preferably 
includes at least one positive and/or negative selection gene, and is suitable 
for transformation, transfection, transgenization and gene knock-in 
procedures. 

According to yet another aspect of the present invention there is 
provided a transduced cell or a transgenic plant transduced with the above 
described expression construct. Such a cell or plant is expressing the gene 
located downstream to the regulatory sequence in a controlled 
developmental manner, mimicking the expression of the lycopene cyclase 
gene of the B locus in b or B tomato plants. 

According to still another aspect of the present invention there is 
provided a method of isolating a gene encoding a polypeptide having an 
amino acid sequence homologous to SEQ ID NOs: 17, 18 and 19 and 
having a major lycopene cyclase catalytic activity from a species. The 
method is effected by executing the following method steps, in which a 
complementary or genomic DNA library prepared from isolated RNA or 
genomic DNA extracted from the species is screened with a probe having a 
sequence derived from SEQ ID NOs: 8, 9, 10 or 1 1 and clones reacting with 
the probe are isolated. Such clones are good candidates to include segments 
of genes homologous to SEQ ID NOs: 8, 9, 10 or 11, which genes are good 
candidates to encode a polypeptide having an amino acid sequence 
homologous to SEQ ID NOs: 17, 18 and 19. 5' cloning strategies, such as, 
but not limited to RACE protocols can be employed to isolate full length 
clones, as well known in the art. 

Thus, according to the present invention, the following uses of gene 
B of tomato are anticipated: 

(i) Increasing the content of p-carotene in tissues of transgenic 
plants over-expressing it. This is an advantageous attribute in fruits and 
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vegetables because ii will provide better nutritional value and enhanced 
color; 

(ii) Increasing the accumulation of lycopene in fruits and flowers , 
of transgenic plants by reducing the activity of B using anti-sense inhibition, 

5 preferably via anti-sense expression, 

(iii) Achieving strong expression of transgenes specifically in 
fruits and flowers using the promoter sequence of the gene B from High- 
beta tomato cultivars. 

Each of the various and aspects of the present invention as delineated 
10 hereinabove and as claimed in the claims section below finds experimental 
support in the Examples section that follows. 

EXAMPLES 



15 Bacteria and plants: E, coli strain XL 1 -Blue was used in all 

experiments described herein. Tomato {Lycopersicon esculentum) CV M82 
serx'ed as the 'wild-type* strain in the fruit ripening measurements. The 
introgression lines IL 6-2 and IL 6-3 (Eshed and Zamir, 1994) were used as 
a source for the B mutation and employed for fine mapping of the B locus. 

20 Fine mapping and cloning of the B locus: As a source to B 

mutation, the lines IL-6-2 or IL-6-3 {BB) were used (Eshed and Zamir, 
1995). Each line was crossed with the cultivated tomato cv M-82 {bb), and 
the hybrids were selfed to create an F-2 population that segregated for both 
the B phenotype and the introgressed DNA segment. 1335 F-2 plants were 

23 scored for the RFLP using markers CT193 and TG578 (Pnueli et al., 1998; 
Tanksley et al., 1992) and for the B phenotype, and recombinant plants were 
collected. The 32 resulting recombinants were further screened with all the 
available RFLP probes surrounding B to accurately map the mutated locus 
(Figure 2). One RFLP marker, TM16 (Pnueli et al., 1998), was co- 

30 segregated wqth B in less than 0.0375 cM resolution. 

The tomato genomic library in YACs was screened with DNA of 
markers TM16 and TG275. Two overlapping YAC clones, designated 271 
and 310, were identified by hybridization. DNA sequences from the ends of 
the inserts in these YACs were amplified by PCR as previously described 

35 (Pnueli et al., 1998) and were used as molecular probes to screen the 32 
recombinant plants for Restriction Fragment Length Polymorphism (RFLP). 
The YAC ends were mapped as shown in Figure 2. It was established that 
YAC 310 overlaps the B locus, thus ensured that the 200 kb insert of YAC 
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310 contains the B gene. In contrasi, recombination between the left end of 
YAC 271 (27 lie) and the B phenotype indicated that this YAC clone did 
not carry the B locus and defined its location in a relatively small region of 
YAC 3 1 0 that did noi overlap with YAC 27 1 (Figure 2). 
5 The DNA insert of YAC 310 was cut with EcoRI and the resulting 

fragments were subcloned in the vector X-gtll. Two phage clones 
designated Bl and B3, co-segregated with the B locus and mapped to the 
end of YAC 310. The nucleotid.e sequence of the insert of Bl was 
deterrnined. The Bl fragment was funher used to screen a genomic library 

10 of wild-type tomato (cv VF36) in the lambda vector EMBL3, and a cosmid 
library' of L. pennellii. A single positive phage clone and a single pos.itive 
cosmid clone were isolated, respectively. 

The Bl fragment was also. used to screen 1.5 million plaques of a • 
cDNA library from a tomato fruit and 3 identical clones were isolated. The 

15 ca. 1300 bp inserts in these clones contained an open reading frame that was 
lacking the 5' end, as determined by nucleotide sequence analysis. The full- 
length cDNAs were then obtained using reverse-transcription polymerase 
chain reaction (RT-PCR) method with RNA isolated from wild-type (VF- 
36) and high-beta (IL 6-3)flowers and fruits. For the PCR reaction we used 

20 5* primers based on the genomic sequence taken from the sequence of Bl 
insert and the 3* primers based on the cloned cDNA. The fiill coding region 
of the cDNA of the allele b of wild type tomato (cv. VF-36) and the allele B 
from jL. pennellii were excised in pBluescript KS- vector which were 
designated pBESC and pBPENN, respectively. DNA sequence comparison 

25 between cDNA and genomic sequences revealed no introns interference in 
the genomic sequence of the b (and B). 

DNA blot hybridization was done according to conventional 
techniques (Sambrook et al., 1989, Eshed and Zamir, 1994) at low 
stringency in a buffer containing 10 x Denharts, 5 x SSC, 50 mM phosphate 

30 buffer (pH-7), 1 % SDS, 50 mg salmon sperm (sheared, autoclaved and 
boiled before adding to the mixture). Filters were washed with 5 x SSC at 
65 ^C. 

Genomic DNA of tomato was prepared from 5 grams of leaf as 
previously described (Eshed and Zamir, 1995). 
35 Amplification by the polymerase chain reaction (PCR) method of the 

fiill length cDNA of the b allele was carried out with the following 
oligonucleotide primers, whose sequence was derived from the genomic 
sequence of the Bl clone (see below): Forward: 5'- 
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AATGGAAGCTCTTCTCAAGCCT-3' (SEQ ID NO:l), Reverse: 5'- 
CACATTCAAAGGCTCTCTATCGC 3' (SEQ ID NO:2). 

Total RNA was extracted from 1.5 grams of fruit or 0.1 gram of ^ 
flower or leaf tissues as previously described (Pecker et al., 1996). 
5 Measurement of mRNA levels by the reverse transcription followed 

by polymerase chain reaction (RT-PCR) technique was carried out as 
previously described (Pecker et al., 1996) using the follovying 
oligonucleotides as primers for the PGR reaction. For amplification of the 
gene Psy the following primer were employed: Forwardl: 5'- 

10 ' TCGAGAACGGACGATG-3' (SEQ ID NO:3), Forward2 (internal): 5'- 
TGCAGAGAGACAGATG-3' (SEQ ID NO:4) and Reverse: 5*- 
ATTTCATGCTTTATCTTTGAAG-3' (SEQ ID NO:5). 

For amplification of allele B: Forward 5*- 
GCTGAAGTTGAAATTGTTGA-3' (SEQ ID NO:6) and Reverse 5'- 

15 TCTCTTCCTCAATAACACTT-3^ (SEQ ID NO:7). 

Sequence analysis: DNA sequence analysis was performed by the 
ABI Prism 377 DNA sequencer (Perkin Elmer) and processed with the ABl 
sequence analysis software. Nucleotide and amino acid sequence analysis 
and comparisons were done using the UWGCG software package. 

20 Plasmids: Plasmid pACCRT-EIB for expressing bacterial 

carotenoid biosynthesis genes in E. coli, was previously described 
(Cunningham et al., 1993). Plasmid pBESC and pBPENN were constructed 
by inserting an 1666 bp of cDNA of the tomato B allele (from L. pennellii) 
or b allele (from L, esculentum), respectively, in the EcoRV site of the 

25 plasmid vector pBluescript KS (Stratagene®). 

Pigment extraction and analysis: For extraction of pigments from 
E. coll, aliquots of 2 ml were taken from bacterial suspension cultures. The 
cells were harvested by centrifugation, washed once with water, 
resuspended in 2 ml of acetone and incubated at 65 °C for 1 0 minutes in the 

30 dark. The samples were centrifuged again at 13,000 g for 5 minutes and the 
acetone supernatant containing the pigments was placed in a clean tube. 
More than 99 % of the carotenoids were extracted by this procedure as 
determined by re-extraction after breaking and grinding the samples. The 
pigment extract was blown to dryness under a stream of N2 and stored at - 

35 20 °C until required for analysis. 

Fruit pigments were extracted from 1 ,0 gram of fresh tissue. The 
tissue was ground in 2 ml of acetone and incubated at room temperature in 
the dark for 10 minutes. Then, 2 ml of dichloro-methane were added and 
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. the samples were agitated until all pigments were transfened to the 
supernatant, which was then filtered. To each sample, 4 ml of ether and 0.4 
ml of 12 % w/v NaCl/H20 were added and the mixture was shaken gently * 
until all pigment was transferred to the upper (ether) phase. The ether was 
5 collected, and the pigment extract was blown to dryness under a stream of 
N2 and stored at -20 °C until required for analysis. 

Carotenoids were separated by reverse phase HPLC using a 
Spherisorb ODS-2 column (silica 5 mm 3.2 mm x 250 mm. Phenomenex®). 
Samples of 50 pi of acetone-dissolved pigrnents were injected to a Waters 

10 600 pump. The mobile phase consisted of acetonitrile:H20 (9:1) - solvent 
A, and 100 % ethyl acetate - solvent B, which were used in a linear gradient 
between A and B for 30 minutes, at flow of 1 ml per minute. Light 
absorption peaks were detected in the range of 200-600 nm using a Waters 
996 photo diode-array detector. All spectra were recorded in the cluting 

15 HPLC solvent, as was the fine absorbance spectral structure. Carotenoids 
were identified by their characteristic absorption spectra and their typical 
retention time, which corresponded to standard compounds of lycopene and 
P-carotene. Peak areas were integrated by the Millennium chromatography 
software (Waters). 

20 

EXPERIMENTAL RESULTS 

The only difference betv\'een the high-beta mutant and the wild-type 
tomato is in the ftnit color due to accumulation of p-carotene at the expense 

25 of lycopene. Thus, it was logical to assume that this mutation occurred in 
the gene that encodes lycopene-P-cyclase (CrtL-b), However, the CrtL-b 
cDNA that was previously cloned from tomato (Pecker et ah, 1996) was 
mapped to 2 loci on chromosomes Nos. 4 and 10, but not on chromosome 6, 
where the B locus was mapped. Even at very low stringency of 

30 hybridization conditions we were unable to detect any hybridization of the 
tomato CrtL'b like sequences on chromosome 6. 

Therefore, the only way to clone the gene B, which is responsible for 
the high-beta phenotype, was to use map-based (^'positional") cloning 
techniques. 

35 Fine mapping of the B locus: As a source to the B mutation, the IL- 

6-2 or lL-6-3 {BB) (Eshed and Zamir, 1995) tomato lines were employed. 
Each line was crossed with the cultivated tomato cv. M-82 {bb), and the 
hybrids were selfed to create an F-2 population that segregated for both the 
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B phenotype and the introgressed DNA segment. 1335 F-2 plants were 
scored, for the RFLP using markers CT-193 and TG-578, (Pnueli et al., 
1998; Tanksley et al., 1992) and for the B phenotype, and recombinant 
plants were collected. The 32 recombinants collected were . further screened 
5 with all the available RPLP probes surrounding B to accurately map the 
mutated locus (Figure 2). One RFLP marker, TM-16 (Pnueli et al., 1998), 
co-segregated with B in less than 0.0375 cM resolution. 

The tomato genomic ]ibrar\' in YACs was screened with the DNA 
marker TM-16 as a molecular probe. Two YAC clones, designated 271 and 

10 310, were identified by hybridization. DNA sequences from the ends of the 
inserts in these YACs were amplified by PCR as previously described 
(Pnueli et al., 1998) and were used as moiecular probes to screen the 32 
recombinant plants for Restriction Fragment Length Polymorphism (RFLP). 
The YAC ends were mapped as shown in Figure 2. It was established' that 

15 YAC 310 overlaps the B locus, thus ensured that the 200 kb insert of YAC 
310 contains the 6 gene. In contrast, recombination between YAC 271 and 
the B phenotype indicated that this clone did not carry the B locus. 
Moreover, it established that B was residing in a confined small region of 
YAC 310 that did not overlap with YAC 271 (Figure 2). 

20 The DNA insen of YAC 310 was cut with EcoRl and the resulting 

fragments were subcloned in the vector X-gtll. Two phage clones 
designated Bl and B3, co-segregated with the B locus and mapped to the 
end of YAC 310. The nucleotide sequence of the insert of Bl was 
determined. The Bl fragment was further used to screen a genomic library 

25 of wild-type tomato (cv VF36) in the lambda vector EMBL3, and a cosmid 
library of L. pennellii. A single positive phage clone and a single positive 
cosmid clone were isolated, respectively. 

The Bl fragment was also used to screen 1.5 million plaques of 
cDNA library from a tomato fruit and 3 identical clones were isolated. The 

30 ca. 1300 bp inserts in these clones contained an open reading frame that was 
lacking the 5' end, as determined by nucleotide sequence analysis. The full- 
length cDNAs were then obtained using reverse-transcription polymerase 
chain reaction (RT-PCR) method with RNA isolated from wild-type (VF- 
36) and high-beta (IL 6-3) flowers and fruits. For the PCR reaction we used 

35 5' primers based on the genomic sequence taken from the sequence of Bl 
insen and the 3' primers based on the cloned cDNA. The full coding region 
of the cDNA of the allele b of wild type tomato (cv. VF-36) and the allele 
B from 1, pennellii were excised in pBluescript KS' vector which were 
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designated pBESG and pBPENTs^ respectively. DNA sequence coinparison 
between cDNA and genomic sequences revealed no introns interference in 
the cDNA sequence. 

Table 1 below summarizes the sequence data with- reference to the 
5 sequence listing: 



TABLE 1 



J vpe , 


allele 


Species 


OiLKJ IIJ J>VJ, 


cDNA 


b 


L. esculentum 


o 
o 


cDNA 


D 


X. "esculentwn 


y 








10 


pDNA 


B 


L. pennellii 


11 


cDNA 


ogC 


L. esculentum 


12 


translated cDNA 


b/B 


L. esculentum 
/ L. pennellii 


13 


translated pDNA 


b 


L. esculentum 


14 


translated gDNA 


B 


L. pennellii 


15 


translated cDNA 


OgC 


L. pennellii 


16 


peptide (translated from cDNA) 


b 


L, esculentum 


17 


peptide (translated from gDNA) 


b 


L. esculentum 


18 


peptide (translated from cDNA) 


B 


L. pennellii 


19 


peptide (rrans[lated from cDNA) 


OgC 


L. esculentum 


20 



cDNA = complementary' DNA; gDNA = genomic DNA: bp = base pairs; aa 
10 = amino acid. 

Clopting and sequence analysis of old-gold^crunson (ogC) 
mutaiioni The old-gold and crimson are two names given to a well-known 
recessive mutation that was found in the Philippines in 1951 (Butler, 1962 

15 and the SolGenes databases: http:// probe.naLusda.gov:8300/ cgi- 
in/webace?db = solgenes & class = Locus & object = og; and: http:// 
probe. nal.usda.gov:8300/ cgi-bin/webace?db = solgenes & class = Image & 
object = og%2c + old + gold). The ogC locus was mapped to chromosome 
6. At least 2000 F-2 progenies of a cross between High-beta (BB) and ogC 

20 were screened for B-ogC double mutants and not a single recombinant plant 
was found. That locates B and ogC less than 0.025 cM apart. The ogC 
phenotype is characterized by over accumulation of lycopene, both in fruits 
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• and flowers, compare lo wild type tomatoes and lack of p-carotchc in the 
fruits. 

Ooning the B locus from ogC mutant plants was done by PCR * 
method on total genomic DNA extracted from ogC plants using primers that 
5 were based on the sequence of the b allele described herein. Sequence 
analysis of the b-homolog revealed a single base deletion, in the coding 
sequence of b at position 104 from the initiation codon (compare SEQ ID 
NOs: 13 and 16). This deletion created a frame-shift mutation that 
shortened the translatable polypeptide to 56 amino acids. This finding 
10 indicates that the ogC is a null mutation of the normal function of the b 
gene. 

Sequences comparison of alleles in the B locus: Nucleotide 
sequence analysis of the 1666 bp cDNA revealed an open reading frame of 
498 codons, potentially coding for a polypeptide of 498 amino acids with a 

13 calculated molecular mass of 56.4 kDa. Nucleotide sequence analysis 
showed 98% identity between b (from VF-36, SEQ ID NO: 8) and B (from 
L. pennellii, SEQ ID NO: 10). The amino acid sequences of B and b are 
97.4% identical (SEQ ID NOs: 17 and 19). 

In the 1200 bp sequences upstream to the translated region of B from 

20 L. pennellii there are four sequence insertions as compared with the 
equivalent region in b from VF-36. The inserts are 26, 13, 9, and 8 bp long 
and start at (5' end) nucleotides 859, 753, 479 and 306, respectively, of SEQ 
ID NOs: 11, 15. They are located upstream to the initiator methionine 
codon in the B allele are the main difference between the B and b alleles, 

25 and are therefore responsible for the differential expression of the B locus in 
tomato. Their sequences are TGACTTCACCCTTCTTTCTTGTCTTC 
(SEQ ID NO:21), AGAGTCTGGGTTC (SEQ ID NO:22), CTAGTATCG 
(SEQ ID NO:23) and CTAAATAT (SEQ ID NO:24). An additional 
AATTTTCAAA (SEQ ID NO:25) sequence, which is found in upstream 

30 regions of ethylene-activated genes such as E4 and E8 (Montgomery et al., 
1993), is shared by the upstream regions of the B and b alleles. All other 
sequences in the promoter and region are 90-94% conserved in the two 
allele (compare SEQ ID NOs: 9 and 11). 

The polypeptide products of B and b are fi-carotene synthases: 

35 The use of E, coli heterologous system for carotenoid biosynthesis has been 
proven to be a powerful tool for identifying genes associated with 
carotenoid biosynthesis. E, coli cells of the strain XLI- Blue, carrying the 
plasmid pACCRT-EIB accumulate lycopene (Cunnungham et al. 1993). 
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Lycopene-accumulaiing E. coli cells were co-transformed with the plasmid 
pBESC or pBPENTs' and selected on LB medium containing both ampicillin 
and chloramphenicol. Caroienoids from cells carrying pACCRT-ElB alone, 
or pACCRT-ElB and either pBESC or pBPENN were extracted and 
analyzed by HPLC. 

Cells carr>'ing only the pACCRT-ElB plasmid produced lycopene, 
while cells carrying both pACCRT-ElB and pBPENN accumulate also p- 
caroiene up to 13 % of totaL carotenoids. Similarly, cells carrying both 
pACCRT-ElB and pBESC produced p-carotene up to 5 % of total 
carotenoids (see Table 2 below). These results indicated that the cDNA- 
products of both the B and b alleles are capable of converting lycopene to P- 
carotene by the symmetric formation of two p-ionone rings on the linear 
lycopene molecule. 

TABLE 2 

The B gene product converts lycopene to (^carotene. Accumulation of 
carotenoids in E. coli cells expressing alleles B or b from tomato 
(percent of total carotenoids) 



plasmid 


lycopene 


P-carotene 


pACCRT-ElB 


100 




pACCRT-EIB 
+ pBESC(b) 


87 


13 


pACCRT-ElB 
+ pBPENN (B) 


95 


5 



Sequence comparison between B and other carotene cyclases: The 
nucleotide sequences of the coding region of b and the coding region of the 
cDNA of the previously published lycopene p-cyclase from tomato, CrtL-b 
(Pecker et al, 1996) , are 59 % identical. The polypeptide products of these 
genes are only 52 % identical. These data explain why CrtL-b could not 
hybridize with the sequence of B. Moreover, while the similarity in amino 
acid sequence between B and CRTLB suggests a common mechanism of 
lycopene cyclization, it clearly demonstrates that B is a novel lycopene P- 
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cyclase enzyme. There is no simiiarity (less than 45 % identities) in the 
non-translated regions of these two genes. 

Surprisingly, the nucleotide sequence of the cDNA of b is 83% 
identical with the cDNA of a gene from bell pepper {Capsicum annuiwi), 
5 which catalyzes the conversion of the ubiquitous 5,6-epoxycarotenoids, 
antheraxanthin and violaxanthin, into the ketocarotenoids capsanthin and 
capsorubin, respectively (Bouvier et al., 1994). This enzyme, called also 
capsanthin-capsorubin synthase (CC^), is synthesized specifically in pepper 
fruits. There is 85 % identity in the deduced amino acid sequences of B and 
10 CCS. 

Expression of B gene during fruit ripening in wild-tj^pe and High- 
beta: Previously, it has been shown that the steady-state levels of mRNA 
of the genes for early enzymes in the carotenoid biosynthesis pathway, 
phytoene synthasfe and phytoene desarurase. increase during fruit ripening in 

15 tomato (Hirschberg et al., 1997). In the case of Pds it was. demonstrated 
that transcriptional up-regulation is responsible for this increase (reviewed 
in Hirschberg et al., 1997). Recently, we have determined that the mRNA 
level of CrtL-b, which encodes lycopene p-cyclase, decreases during tomato 
fruit ripening (Pecker et al. 1996). 

20 To determine the regulation of expression of B gene during fruit 

development in tomato, we have measured by RT-PCR its mRNA level at 
different stages of fruit development. As can be seen in Figure 3, mRNA of 
the b gene is undetected in leaves and during the green stages of fruit 
ripening of wild-type tomato. However, it is increased at the 'breaker' stage 

25 of ripening but then disappears at later stages of ripening. This marked drop 
of mRNA of B is contrasted by the dramatic increase in mRNA level of Psy 
at the same stages of fruit ripening. 

In contrast to the wild-type tomato, the mRNA level of B in the fruit 
of the High-beta mutant (containing the B allele) increases dramatically at 

30 the 'breaker' stage and remains high during all the subsequent ripening 
stages (Figure 4). These results indicate that the major difference between 
alleles b and B is in the level of expression at different ripening stages. The 
results further explain the phenotype of mutant High-beta, canning the B 
allele, where a novel type of lycopene cyclase, which is capable of 

35 converting lycopene to p-carotene, is highly expressed during fruit ripening. 

Although the invention has been described in conjunction with 
specific embodiments thereof, it is evident that many alternatives. 
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modifications and variations will be apparent to those skilled iri the art. 
Accordingly, it is intended lo embrace all such alternatives, modifications 
and variations that fall within the spirit and broad scope of the appended * 
claims. 
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. WHATJS CLAIMED IS: 

1. An isolated complementar\' or genomic DNA segment * 
comprising a nucleotide sequence coding for a polypeptide having an amino 
acid sequence selected from the group consisting of SEQ ID NOs: 17, 18 
and 19 and functional naturally occurring and man-induced variants thereof, 
with the provision that said polypeptide has a major lycopene cyclase 
catalytic activity. 

2. The isolated DNA segment of claim 1, wherein said 
nucleotide sequence is selected from the group consisting of SEQ ID NOs: 
8, 9, 10 and 1 1 and functional naturally occurring and man-induced variants 
thereof. 

3. The isolated DNA segment of claim 1, wherein said 
nucleotide sequence is a cDNA or a genomic DNA isolated form tomato. 

4. An isolated complementar>' or genomic DNA segment 
comprising a nucleotide sequence selected from the group consisting of 
SEQ ID NOs: 8, 9, 1 0 and 11 . 

5. A polypeptide comprising an amino acid sequence selected 
from the group consisting of SEQ ID NOs: 17, 18 and 19 and functional 
naturally occurring and man-induced variants thereof, said polypeptide 
having a major lycopene cyclase catalytic activity. 

6. A transduced cell overexpressing a polypeptide including an 
amino acid sequence selected from the group consisting of SEQ ID NOs: 
17, 18 and 19 and functional naturally occurring and man-induced variants 
thereof, said polypeptide having a major lycopene cyclase catalytic activity, 
the cell therefore over producing p-carotene on an expense of lycopene. 

7. The transduced cell of claim 6, selected from the group 
consisting of a prokaryotic cell and a eukaryotic cell. 

8. The transduced cell of claim 7, wherein said eukaryotic cell is 
of a higher plant. 



,26 

9. The transduced cell of claihi 6, wherein the cell fonns a part 
of a transgenic plant. 

10. A method of down-regulatmg production of p-carotene in a 
cell comprising the step of introducing into the cell at least one anti-sense 
polynucleotide sequence capable of base pairing with messenger RNA 
coding for a polypeptide including an amino acid sequence selected from 
the group consisting of SEQ ID NOs: 17, 18 and 19 and functional naturally 
occurring and man-induced variants thereof, said polypeptide having a 
major fycopene cyclase catalytic activity, the cell therefore under producing 
P-carotene from lycopene. 

1 1. The rnethod of claim 10, wherein said at least one anti-sen$e 
polynucleotide sequence includes a synthetic oligonucleotide. 

12. The. method of claim 11, wherein said synthetic 
oligonucleotide includes a man-made modification rendering said synthetic 
oligonucleotide more stable in cell environment, 

13. The method of claim 11, wherein said synthetic 
oligonucleotide is selected from the group consisting of methylphosphonate 
oligonucleotide, monothiophosphate oligonucleotide, dithiophosphate 
oligonucleotide, phosphoramidate oligonucleotide, phosphate ester 
oligonucleotide, bridged phosphorothioate oligonucleotide, bridged 
phosphoramidate oligonucleotide, bridged methylenephosphonate 
oligonucleotide, dephospho intemucleotide analogs with siloxane bridges, 
carbonate bridge oligonucleotide, carboxymethyl ester bridge 
oligonucleotide, carbonate bridge oligonucleotide, carboxymethyl ester 
bridge oligonucleotide, acetamide bridge oligonucleotide, carbamate bridge 
oligonucleotide, thioether bridge oligonucleotide, sulfoxy bridge 
oligonucleotide, sulfono bridge oligonucleotide and a-anomeric bridge 
oligonucleotide. 

14. The method of claim 10, wherein said at least one anti-sense 
polynucleotide sequence is encoded by an expression vector. 



15. The method of claim 10, wherein said cell is selected from the 
group consisting of a prokaryotic cell and a eukaryotic cell. 
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16. The method of claim 15, wherein said eukaryotic cell is of a 
higher plant. 

17. The method of claim 15. wherein the cell forms a part of a 
transgenic plant. 

18. An expression construct for directing an expression of a gene 
in fruit ..or flower comprising a regulator)' sequence selected from the group 
consisting pf an upstream region of a B allele of tomato and an upstream 
region of a b allele of tomato. _ . . 

19. The expression construct of claim 18, comprising a functional 
part of nucleotides 1-1210 of SEQ ID NO: 14 or nucleotides 1-1600 ofSEQ 
ID NO: 15. or functional naturally occurring and man-induced variants 
thereof. • " 

20. The expression construct of claim 18, comprising at least one 
control element having a sequence selected from the group consisting of 
SEQ ID NOs:21-24, all derived from SEQ ID NO:ll, and functional 
naturally occurring and rnan-induced variants thereof. 

21. The expression construct of claim 18, wherein the expression 
construct is selected from the group consisting of plasmid, cosmid, phage, 
virus, bacmid and artificial chromosome. 

22. The expression construct of claim 18, designed to integrate 
into a genome of a host. 

23. A method of isolating a gene encoding a polypeptide having 
an amino acid sequence homologous to SEQ ID NOs: 17, 18 and 19 and 
having a major lycopene cyclase catalytic activity from a species, the 
method comprising the step of screening a complementary or genomic DNA 
library' prepared from isolated RNA or genomic DNA extracted from said 
species with a probe having a sequence derived from SEQ ID NOs: 8, 9, 10 
or 1 1 and isolating clones reacting with said probe. 
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24. A transduced cell transduced with the expression construct of 
claim 18. 

« 

25. A transgenic plant transduced with the expression construct of 
claim 18. 
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SEQUENCE LISTING 



(1) 



GENERAL INFORMATION: 
(*i> APPLICANT: 
(ii) TITLE OF INVENTION: 



(iii) KU^SBER OF SEQUENCES: 

(iv) CORRESPONDENCE ADDRESS: 
(A) ADDRESSEE: 

(E) STREET:' 

(C) . CITY: 

(D) STATE: 

(E) COUNTRY: 
. (F) ZIP: 

tv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 

- . (B) COMPUTER: 

(C) OPERATING SYSTEM: 



(D) 



SOFTWARE : 



Joseph Hirschberg et a} . 

POLYNUCLEOTIDES CONTROLLING THE EXPRESSION 
OF AND CODING FOR GENE B IN TOMATO AND USE 
OF SAME FOR ALTERING CAROtENOID 
BIOSYNTHESIS 
25 

Mark M. Friedman c/o Anthony Castorina 
20001 Jefferson Davis Highway. Suite 207 
. Arlington 
Virginia 

United States of America 
22202 

1.44 megabyte, 3.5" microdisk 
Twinhead, Slimnote 8 9 0TX 

MS DOS version 6.2, 
Windows version 3.11 
Word for Windows version 2.0, " 



, ^ converted to ASCI 

<vi) • CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 
ivii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME : " 

(B) REGISTRATION NUMBER: 

(C) REFERENCE/DOCKET NUMBER: 

(ix) .^.TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 

(B) TELEFAX: 

(C) TELEX: 



Friedmam, Mark M. 

.33, 883 
325/12 

972-3-5625553 
972-3-5625554 



(2) 



INFORMATION FOR SEQ ID NO : 1 : 



(i) 



(xi j 



SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 

(B) TYPE: 

(C) STRANDEDNESS : 

(D) TOPOLOGY: 
SEQUENCE DESCRIPTION: 



AATGGAAGCT CTTCTCAAGC CT 22 



22 

nucleic acid 

single 

1 inear 

SEO ID N0:1 : 



(2) 



INFORMATION FOR SEQ ID NO : 2 : 



(i ) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 

(B) TYPE: 

(C) STRANDEDNESS: 

(D) TOPOLOGY: 
SEQUENCE DESCRIPTION: 



CACATTCAAA GGCTCTCTAT CGC 2 3 



23 

nucleic acid 
singl e 
1 i near 

SEO ID NO: 2: 
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:iNFORMATION FOR SEC ID NO : 3 : 
,♦(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

TCGAGAACGG ACGATG 16 

(2) INFORMATION FOR SEC ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 16 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

TGCAGAGAGA CAGATG 16 

(2) INFORMATION FOR SEQ ID NO : 5 :■ 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: * 

ATTTCATGCT TTATCTTTGA AG 22 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

GCTGAAGTTG AAATTGTTGA 2 0 

(2) INFORMATION FOR SEC ID NO : 7 : 

» (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

TCTCTTCCTC AATAACACTT 2 0 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1666 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

ATGGAAGCTC TTCTCAAGCC TTTTCCATCT CTTTTACTTT CCTCTCCTAC 50 

ACCCCATAGG TCTATTTTCC AACAAAATCC CTCTTTTCTA AGTCCCACCA 100 

CCAAAAAAAA ATCAAGT^^AA TGTCTTCTTA GAAACAAAAG TAGTAAACTT 150 

TTTTGTAGCT TTCTTGATTT AGCACCCACA TCAAAGCCAG AGTCTTTAGA 200 

TGTTAACATC TCATGGGTTG ATCCTAATTC GAATCGGGCT CAATTCGACG 2 50 

TGATCATTAT CGGAGCTGGC CCTGCTGGGC TCAGGCTAGC TGAACAAGTT 3 00 
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TCTAAATATG GTATTAAGGT ATGTTGTG7T GACCCTTCAC CACTCTCCAT 2 50 
GTGGCCAAAT AATTATGGTG TTTGGGTTGA TGAGTTTGAG AATTTAGGAC 4 00 
TGGAAAATTG TTTAGATCAT AAATGGCCTA TGACTTGTGT GCATATAAAT 4 5C 
GATAACAAAA CTAAGTATTT GGGAAGACCA TATGGTAGAG TTAGTAGAAA 500 
GAAGCTGAAG TTGAAATTGT TGAATAGTTG TGTTGAGAAC AGAGTGAAGT 55 0 
TTTATAAAGC TAAGGTTTGG AAAGTGGAAC ATGAAGAATT TGAGTCTTCA 6 00 
ATTGTTTGTG ATGATGGTAA GAAGATAAGA GGTAGTTTGG TTGTGGATGC 6S0 
AAGTGGTTTT GCTAGTGATT TTATAGAGTA TGACAGGCCA AGAAACCATG 700 
GTTATCAAAT TGCTCATGGG GTTT7AGTAG AAGTTGATAA TCATCCATTT 75 0 
GATTTGGATA AAATGGTGCT TATGGATTGG AGGGATTCTC ATTTGGGTAA 6 00 
TGAGCCATAT TTAAGGGTGA ATAATGCTAA AGAACCAACA TTCTTGTATG 85 0 
CAATGCCATT TGATAGAGAT TTGGTTTTCT TGGAAGAGAC TTCTTTGGTG 90 0 
AGTCGTCCTG TTTTATCGTA TATGGAAGTA AAAAGAAGGA TGGTGGCAAG 5 50 
ATTAAGGCAT TTGGGGATCA AAGTGAAAAG TGTTATTGAG GAAGAGAAAT 10 00 
GTGTGATCCC TATGGGAGGA CCACTTCCGC GGATTCCTCA AAATGTTATG 105 0 
GCTATTGGTG GGAATTCAGG GATAGTTCAT CCATCAACAG GGTACATGGT 13 00 
GGCTaGGAGC ATGGCTTTAG CACCAGTACT AGCTGAAGCC ATCGTCGAGG 1150 
GGCTTGGCTC AACAAGAATG ATAAGAGGGT CTCAACTTTA CCATAGAGTT 1200 
TGGAATGGTT TGTGGCCTTT GGATAGAAGA TGTGTTAGAG AATGTTATTC 125 0 
ATTTGGGATG GAGACATTGT TGAAGCTTGA TTTGAAAGGG ACTAGGAGAT 1300 
TGTTTGACGC TTTCTTTGAT CTTGATCCTA AATACTGGCA AGGGTTCCTT 13 50 
TCTTCAAGAT TGTCTGTCAA AGAACTTGGT TTACTCAGCT TGTGTCTTTT 14 00 
CGGACATGGC TCAAACATGA CTAGGTTGGA TATTGTTACA AAATGTCCTC 14 50 
TTCCTTTGGT TAGACTGATT GGCAATCTA'G CAATAGAGAG CCTTTGAATG 1500 
TGAAAAGTTT GAATCATTTT CTTCATTTTA ATTTCTTTGA TTATTTTCAT 1550 
ATTTTCTCAA TTGCAAAAGT GAGATAAGAG CTACATACTG TCAACAAATA 16 00 
AACTACTATT GGAAAGTTAA AATATGTGTT TGTTGTATGT TATTCTAATG 16 50 
GAATGGATTT TGTAAA 1666 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 876 

(B) TYPE: nucleic acid 

(C) STRANDEDNE SS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GAATTCTCTG AAAAGGAGCA CCATATTTGC CGCACTGTGG TTCATATTTC 5 0 
CAAGTACATT TAGATGAACT ATATCATCAG ATTGAAAGGT TATTGTATAA 100 
TCT^TCCAGT GGATTCTCGT TCTGGCACCT TTAGAAGTAC ATGTGCGGAA 150 
AAGAATGATA AGGTTTGTAT TGTTGTTGAC AAAGCCTGTT GCCTTTCTCA 2 00 
TTTGTAAATG TTCTGAACGA CTCCTAAATT ACTCTTAAGG TGTAAGGTCT 2 50 
TCCGTGCCTG TTTGTAAATA TAATGCTGTG CCGTGACTTA CCTTTTGTAC 3 00 
CATTTGTTCA AATGTATGGC CTGAACACCA GGGTTGTCAA AAATGTCTCA 3 50 
TGCCCGTTTT ATTGGTCTGA AAATGGCGTG ATGCCAAATT CTGCCGCTCC 4 00 
ACAGTGAGCA TTTCGATCTA CTGGAAATTG ACCAACTTAT TTTATCACTT 4 50 
GATAACTAAA CTU^AATCCTA TTAACTTTAA TCATACATTG TATTTATACC 500 
GAAAAATTTA TGCATAACTC ATTAAATTAC CTTTTTTAGC AGTCAAATTC 5 50 
TAAATCAGTT TCTAATTTAT CAAAATGGCT TTTATAGGGT CCCATTTCCA 6 00 
CTAATATACC TGCCGTCCAT GCACTGACTA CAAAACAAAT ACCTCACTAT 650 
GTTTGTTAGT GCTTGGTAAT ATAAAACCTT TTCTTTTATG AGAAAGTTCA 7 00 
CCGAGAATAA TTTTCTATTT GTGGCATAAT AGTATATAGT GCAGATTGAC 750 
AAGAATTTAA TTTTGCAGTT GGGCACATGA ACAATTTTCC TCAAAGTTGT BOO 
AGAAAGTACT TTTCATTTTC TTGTCACCGA AAATTATTTA TAATTGAAAT 85 0 
TAAAACCGAA TGAGCTGCAA GATTCAAGTC GAATTTTCAA AAGAATTGAC 900 
CAAGAAAAAA TTCAAAAATA TCCCCCACCC CCTACCAAAC ACATCCTAAA 9 50 
GTGAGGTATA GACTGGGACT GGGATTGGGA AAAGGGTAAA ATGCTTTCAC 1000 
TAGCTTAGCA AAGATTCCAC TTTGTTAGCT ATCTTTCTTT CTCATTTCCT 105 0 
TTTTTCTTTT ycT'pTTi'TTT GTTATATAAG CCAAAGTAGG TACCCAAAAG 1100 
CATCAATATT TTGTATTGCT TGGTGATTCC TCTGTAGTCC AGTATTTCAT 1150 
TTTCTACAAG TTCCACCTCC CTCCATAATT AACCATTATC AATCTTATAC 12 00 
ATTCTCTATA ATGGAAACTC TTCTCAAGCC TTTTCCATCT CTTTTACTTT 12 50 
CCTCTCCTAC ACCCCATAGG TCTATTTTCC AACAAAATCC CTCTTTTCTA 13 00 
AGTCCCACCA CCAAAAAAAA ATCAAGAAAA TGTCTTCTTA GAAACAAAAG 1350 
TAGTAAACTT TTTTGTAGCT TTCTTGATTT AGCACCCACA TCAAAGCCAG 14 00 
AGTCTTTAGA TGTTAACATC TCATGGGTTG ATCCTAATTC GAATCGGGCT 14 50 
CAATTCGACG TGATCATTAT CGGAGCTGGC CCTGCTGGGC TCAGGCTAGC 1500 
TGAACAAGTT TCTAAATATG GTATTAAGGT ATGTTGTGTT GACCCTTCAC 155 0 
CACTCTCQAT GTGGCCAAAT AATTATGGTG TTTGGGTTGA TGAGTTTGAG 1600 
AATTTAGGAC TGGAAAATTG TTTAGATCAT AAATGGCCTA TGACTTGTGT 16 50 
GCATATAAAT GATAACAAAA CTAAGTATTT GGGAAGACCA TATGGTAGAG 1700 
TTAGTAGAAA GAAGCTGAAG TTGAAATTGT TGAATAGTTG TGTTGAGAAC 1750 
AGAGTGAAGT TTTATAAAGC TAAGGTTTGG AAAGTGGAAC ATGAAGAATT 1800 
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TGAGTCTTCA 
TTGTGGATGC 
AGAAACCATG 
TCATCCATTT 
ATTTGGGTAA 
TTCTTGTATG 
TTCTTTGGTG 
TGGTGGCAAG 
GAAGAGAAAT 
AAATGTTATG 
GGTACATGGT 
ATCGTCGAGG 
CCATAGAGTT 
AATGTTATTC 
ACTAGGAGAT 
AGGGTTCCTT 
TGTGTCTTTT 
AAATGTCCTC 
CCTTTGAATG 
TTATTTTCAT 
TCAACAAATA 
TATTCTAATG 



ATTGTTTGTG 
AAGTGGTTTT 
GTTATCAAAT 
GATTTGGATA 
TGAGCCATAT 
CAATGCCATT 
AGTCGTCCTG 
ATTAAGGCAT 
GTGTGATCCC 
GCTATTGGTG 
GGCTAGGAGC 
GGCTTGGCTC 
TGGAATGGTT 
ATTTGGGATG 
TGTTTGACGC 
TCTTCAAGAT 
CGGACATGGC 
TTCCTTTGGT 
TjGAAAAGTTT 
ATTTTCTCAA 
ftACTACTATT 
GAATGGATTT 



ATGATGGTAA 
GCTAGTGATT 
TGCTCATGGG 
AAATGGTGCT 
TTAAGGGTGA 
TGATAGAGAT 
TTTTATCGTA 
TTGGGGATCA 
TATGGGAGGA 
GGAATTCAGG 
ATGGCTTTAG 
AACAAGAATG 
TGTGGCCTTT 
GAGACATTGT 
TTTCTTTGAT 
TGTCTGTCAA 
TCAAACATGA 
TAGACTGATT 
GAATCATTTT 
TTGCAAAAGT 
GGAAAGTTAA 
TGTAAA 



GAAGATAAGA 
TTATAGAGTA 
GTTTTAGTAG 
TATGGATTGG 
ATAATGCTAA 
TTGGTTTTCT 
TATGGAAGTA 
AAGTGAAAAG 
CCACTTCCGC 
GATAGTTCAT 
CACCAGTACT 
ATAAGAGGGT 
GGATAGAAGA 
TGAAGCTTGA 
CTTGATCCTA 
AGAACTTGGT 
CTAGGTTGGA 
GGCAATCTAG 
CTTCATTTTA 
GAGATAAGAG 
AATATGTGTT 



GGTAGTTTGG 
TGACAGGCCA 
AAGTTGATAA 
AGGGATTCTC 
AGAACCAACA 
TGGAAGAGAC 
AAAAGAAGGA 
TGTTATTGAG 
GGATTCCTCA 
CCATCAACAG 
AGCTGAAGCC 
CTCAACTTTA 
TGTGTTAGAG 
TTTGAAAGGG 
AATACTGGCA 
TTACTCAGCT 
TATTGTTACA 
CAATAGAGAG 
ATTTCTTTGA 
CTACATACTG 
TGTTGTATGT 



1850 
1 900 
1950 
2000 
2050 
2100 
2150 
2200 
2250 
2300 
2350 
2400 
2450 
2500 
2550 
2600 
26 50 
2700 
2750 
2800 
2850 
2876 



(2) INFORMATION FOR SEO ID NO : 1 0 : 

(i) SEQUENCE CHARACTERISTICS: 



(A) 
(E) 
(C) 



LENGTH : 
TYPE : 

STRANDEDNESS : 
TOPOLOGY : 



1740 

nucleic acid 
doubl e 
linear 



Ixiy SEQUENCE DESCRIPTION: SEQ ID NO : 1 0 : 

ATGGAAGCTC TTCTCAAGCC TTTTCCATCT CTTTTACTTT CCTCTCCTAC 50 
ACCCTATAGG TCTATTGTCC AACAAAATCC TTCTTTTCTA AGTCCCACCA 100 
CCAAAAAAAA TCAAGAAAAT GTCTTCTTAG AAACAAAAGT AGTAAACTTT 150 
TTTGTAGCTT TCTTGATTTA GCACCCACAT CAAAGCCAGA GTCTTTAAAT 20 0 
GTTAACATCT CATGGGTTGA TCCTAATTCG AATCGGGCTC AATTCGACGT 25 0 
GATCATTATC GGAGCTGGCC CTGCTGGGCT CAGGCTAGCT " GAACAAGTTT 300 
CTAAATATGG TATTAAGGTA TGTTGTGTTG ACCCTTCACC ACTCTCCATG 350 
TGGCCAAATA ATTATGGTGT TTGGGTTGAT GAGTTTGAGA ATTTAGGACT 4 0 0' 
GGAAAATTGT TTAGATCATA AATGGCCTAT GACTTGTGTG CATATAAATG 4 50 
ATAACAAAAC TAAGTATTTG GGAAGACCAT ATGGTAGAGT TAGTAGAAAG 50 0 
AAGCTGAAGT TGAAATTGTT GAATAGTTGT GTTGAGAACA GAGTGAAGTT 550 
TTATAAAGCT AAGGTTTGGA AAGTGGAACA TGAAGAATTT GAGTCTTCAA 600 
TTGTTTGTGA TGATGGTAAG AAGATAAGAG GTAGTTTGGT TGTGGATGCA 65 0 
AGTGGTTTTG CTAGTGATTT TATAGAGTAT GACAGGCCAA GAAACCATGG 700 
TTATCAAATT dCTCATGGGG TTTTAGTAGA AGTTGATAAT CATCCATTTG 750 
ATTTGGATAA AATGGTGCTT ATGGATTGGA GGGATTCTCA TTTGGGTAAT 800 
GAGCCATATT TAAGGGTGAA TAATGCTAAA GAACCAACAT TCTTGTATGC 85 0 
AATGCCATTT GATAGAGATT TGGTTTTCTT GGAAGAGACT TCTTTGGTGA 900 
GTCGTCCTGT GTTATCGTAT ATGGAAGTAA AAAGAAGGAT GGTGGCAAGA 950 
TTAAGGCATT TGGGGATCAA AGTGAAAAGT GTTATTGAGG AAGAGAAATG 1000 
TGTGATCCCT ATGGGAGGAC CACTTCCGCG GATTCCTCAA AATGTTATGG 105 0 
CTATTGGTGG GAATTCAGGG ATAGTTCATC CATCAACAGG GTACATGGTG 1100 
GCTAGGAGCA TGGCTTTAGC ACCAGTACTA GCTGAAGCCA TCGTCGAGGG 1150 
GCTTGGCTCA ACAAGAATGA TAAGAGGGTC TCAACTTTAC CATAGAGTTT 12 00 
GGAATGGTTT GTGGCCTTTG GATAGAAGAT GTGTTAGAGA ATGTTATTCA 12 50 
TTTGGGATGG AGACATTGTT GT^GCTTGAT TTGAAAGGGA CTAGGAGATT 13 00 
GTTTGACGCT TTCTTTGATC TTGATCCTAA ATACTGGCAA GGGTTCCTTT 13 50 
CTTCAAGATT GTCTGTCAAA GAAACTTGGT TTACTCAGCT TGTGTCTTTT 14 00 
CGGACATGGC TCAAACATGA CTAGGTTGGG ATATTGTTAC AAAATGTCCT 14 50 
CTTCCTTTGG TTAGACTGAT TGGCAATCTA GCAATAGAGA GCCTTTGAAA 15 00 
TGTGAAAAGT TTGAATCATT TTCTTCATTT TAATTTCTTT GATTATTTTC 15 50 
ATATTTTCTC AATTGCAGAA TGAGATAAAA ACTACATACT GTCGACAAAT 1600 
AAACTACTAT TGGAANGTTA AAATAATGTG TGTGTTGNAT GTTANGCCTA 16 50 
ATGGAANGGA TGNGGTTANG CAATTTATGA ACTGNNCGCT CTGTTCGCTT 17 00 
AAAANCCTTG GTTCCACCTT AANGGAANGG NCCGGCCATT 174 0 



(2) INFORMATION FOR SEQ ID NO:ll: 

(i> SEQUENCE CHARACTERISTICS: 

(A) L.ENGTH: 2 897 

(B) TYPE: nucleic acid 



wo 00/08920 



PCT/US99/1S327 



to 

(D) 



STRANDEDMESS : 
TOPOLOGY : 



doubJ e 
linear 



(xi) SEQUENCE DESCRIPTION: SEQ 3D NO:ll: 

TGGTTCATAT TTCCAATTAC ATTTAGATGA ACTATATCAT CAGGAGTGAA SO 
AGGTTATTGT ATAATCAATC CAGTGGATTC TCGTTCTGGC ACCTTTAGAA 100 
GTACATGTGC GGAAAAGAAT GATAAGGTTT GTATTGTTGT TGACAAGGCC 150 
TGTTGCCTTT CTCATTTGTA AATGTTCTGA ACGACTCCTA AATTACTCTT 200 
AAAGTGTAAG GTCTTCCGTG CCTGTTTGTA TATATAATGC TGTGCCGTGA 250 
CTTACCTTTT GTACCATTTG TTCAAATGTA TGGCCTGGAC ACTAGGGTTG 3 00 
TCAAAAATGT CTCATGACTT CACCCTTCTT TCTTGTCTTG GTGCCCGTTT 3 50 
TATTGGTCTG AGAACGGCGT GATGCCAAAT TCTGCCGCTC CACAGTGAGC 4 00 
ATTTCGATCT ACTGGAAATT GACCAACTTA TTTTATCACT TGATAACTAG 4 50 
AGTCTGGGTT CAAACAAAAT CCAATAACTT CAATCATACA TTGTATTTAT 500 
ATTGAAAAAA TTATGCACAA CTCAGTAAAT ' TACCTTTTTT TGCAGTCAAA 550 
AATTCTAGAT CAGTTTCTAA TTAATCAAAA TGGCCTTTAT AGGGTCCCAG 6 00 
TTCCATTAAT ATACCTGCCG TCCATGCACT GATTACAAGA CAAATACCTC 6 50 
ACTATGTTTG TTAGTGCTTG GTAATATAAA ACCTTTTCTT TTATGAG/VAA 700 
GTTCACCGAA AATAATTTTC TATTTGTGGC ATAACTAGTA TCGAAGTATA 75C 
TAGTGCAGAT TGACAAGAAT TTAATTTTGC AGTTGGGCAC ATGAACAATT 6 00 
TTCCTCAAAG TTGTAGAAAA TATTTTTCAT TTTCTTGTCA CCGAAAATTA 850 
TTTATAATTG AAATTGAAAC CGAATGAGCT GCAAGACTCG AGTCGAATTT 90 0 
CAAAAAAATT GACCAACTAA ATATGAAAAA ATCCGAATAT ATCCCCCACC 95 0 
pCCTACCAAA CACATCCTAA AGTGAGGTAT AGACTGGGAC TGGGATTGGG 1000 
AAAAGGGTAA AATGCTTTCA CTAGCTTAGC AAAGATTCCA CTTTGTTAGC 1050 
TATCTTTCTT TCTCATTTCC TTTTTTCTTT t-tCTTTTTTT TGTTATATAA 1100 
GCCAAAGTAG GTACCCAAAA GCATCAATAT TTTGTATTGC TTGGTGATTC 1150 
CTCTTTACTC CAGTATTTCA TTTTCTACAA GTTCCACCTC CCTCCATAAT 12 00 
TAACCATTAT CAATCTTATA CATTTTCTAT AATGGAAACT CTTCTCAAGC 12 50 
CTTTTCCATC TCTTTTACTT TCCTCTCCTA CACCCTATAG GTCTATTGTC 13 00 
CAACAAAATC CTTCTTTTCT AAGTCCCACC ACCCAAAAAA AATCAAGAAA 13 50 
ATGTCTTCTT AGAAACAAAA GTAGTAAACT TTTTTGTAGC TTTCTTGATT 14 00 
TAGCACCCAC ATCAAAGCCA GAGTCTTTAA ATGTTAACAT CTCATGGGTT 14 50 
GATCCTAATT CTGGTCGGGC TCAATTCGAC GTGATCATTA TCGGAGCTGG 1500 
CCCTGCTGGG CTCAGGTTAG CTGAACAAGT TTCTAAATAT GGTATTAAGG 1550 
TATGTTGTGT TGACCCTTCA CCACTCTCCA TGTGGCCAAA TAATTATGGT 1600 
GTTTGGGTTG ATGAGTTTGA GAATTTACiSA CTGGAAGATT GTTTAGATCA 1650 
TAAATGGCCT ATGACTTGTG TGCATATAAA TGATAACAAG ACTAAGTATT 1700 
TGGGAAGACC ATATGGTAGA GTTAGTAGAA AGAAGCTGAA GTTGAAATTG 175 0 
TTGAACAGTT GTGTTGAGAA CAGAGTGAAG TTTTATAMG CTAAGGTTTG 1800 
GAAAGTGGAA CATGAAGAAT TTGAGTCTTC AATTGTTTGT GATGATGGTA 185 0 
AGAAGATAAG AGGTAGTTTG GTTGTGGATG CAAGTGGTTT TGCTAGTGAT 1900 
TTTATAGAGT ATGACAAGCC AAGAAACCAT GGTTATCAAA TTGCTCATGG 195 0 
GGTTTTAGTA GAAGTTGATA ATCATCCATT TGATTTGGAT AAAATGGTGC 2 000 
TTATGGATTG GAGGGATTCT CATTTAGGTA ATGAGCCATA TTTAAGGGTG 2 05 0 
AATAATGCTA AAGAACCAAC ATTCTTGTAT GCAATGCCAT TTGATAGAAA 2100 
TTTGGTTTTC TTGGAAGAGA CTTCTTTGGT GAGTCGTCCT GTGTTATCGT 215 0 
ATATGGAAGT AAAAAGAAGG ATGGTGGCAA GATTAAGGCA TTTGGGGATC 2 2 00. 
AAAGTCAGA^, GTGTTATTGA GGAAGAGAAA TGTGTGATCC CTATGGGAGG 225 0 
ACCACTTCCG CGGATTCCTC AAAATGTTAT GGCTATTGGT GGGAATTCAG 23 0 0 
GGATAGTTCA TCCATCAACG GGGTACATGG TGGCTAGGAG CATGGCTTTA 2 35 0 
GCACCAGTAC TAGCTGAAGC CATCGTCGAG GGGCTTGGCT CAACAAGAAT 24 00 
GATAAGAGGG TCTCAACTTT ACCATAGAGT TTGGAATGGT TTGTGGCCTT 24 50 
TGGATAGAAG ATGTGTTAGA GAATGTTATT CATTTGGGAT GGAGACATTG 2500 
TTGAAGCTTG ATTTGAAAGG GACTAGGAGA TTGTTTGACG CTTTCTTTGA 2550 
TCTTGATCCT AAATACTGGC AAGGGTTCCT TTCTTCAAGA TTGTCTGTCA 2600 
AAGAACTTGG TTTACTCAGC TTGTGTCTTT TCGGACATGG CTCAAATTTG 2650 
ACTAGGTTGG ATATTGTTAC AAAATGTCCT GTTCCTTTGG TTAGACTGAT 2 700 
TGGCAATCTA GCAGTAGAGA GCCTTTGAAT GTGAAAAGTT TGAATCATTT 2750 
TCTTTATTTT AATTTCTTTG ATTATTTTCA TATTTTCTCA ATGCAAAAGT 2800 
GAGAGAAGAC TATACACTGT CAACAAATAA ACTACTATTG GAAAGTTAAA 2 850 
ATAATGTGTG TGTTGTATGT TATGCTAATG GAATGGATTG GTGTAT^ 2 8 97 



(2) 



INFORMATION FOR SEQ ID NO:12: 
(i) SEQUENCE CHARACTERISTICS: 



(A) 
(E) 
(C) 

<D) 



LENGTH : 
TYPE : 

STRANDEDNESS : 

TOPOLOGY : 



2740 

nucleic acid 
doubl e 

1 i near 

SEO ID NO: 12 : 



(xi) SEQUENCE DESCRIPTION: 

ATGGAAGCTC TTCTCAAGCC TTTTCCATCT CTTTTACTTT CCTCTCCTAC 5 0 
ACCCTATAGG TCTATTGTCC AACAAAATCC TTCTTTTCTA AGTCCCACCA 100 



PCT/US95;. ' 832: 



TTTGTAGCTT 
GTTAACATCT 
GATCATTATC 
CTAAATATGG 
TGGCCAAATA 
GGAAAATTGT 
ATT^CAAAAC 
AAGCTGAAGT 
TTATAAAGCT 
TTGTTTGTGA 
AGTGGTTTTG 
TTATCAAATT 
ATTTGGATAA 
GAG CC AT ATT 
AATGCCATTT 
GTCGTCCTGT 
TTAAGGCATT 
TGTGATCCCT 
CTATTGGTGG 
GCTAGGAGCA 
GCTTGGCTCA 
GGAATGGTTT 
TTTGGGATGG 
GTTTGACGCT 
CTTCAAGATT 
CGGACATGGC 
CTTCCTTTGG 
TGTGAAAAGT 
ATATTTTCTC 
AAACTACTAT 
ATGGAANGGA 
AAAANCCTTG 



TCAAGAAAAT 
TCTTGATTTA 
CATGGGTTGA 
GGAGCTGGCC 
TATTAAGGTA 
ATTATGGTGT 
TTAGATCATA 
TAAGTATTTG 

tgaaattgtt 
Aaggtttgga 
tgatggtaag 
ctagtgattt 
gctcatgggg 
aatggtgctt 
taagggtgaa 
gatagagatt 
gttatcgtat 

TGGGGATCAA 
ATGGGAGGAC 
GAATTCAGGG 
•TGGCTTTAGC 
ACAAGAATGA 
GTGGCCTTTG 
AGACATTGTT 
TTGTTTGATC 
GTCTGTCAAA 
TCAAACATGA 
TTAGACTbAT 
TTGAATCATT 
AATTGCAGAA 
TGGAA^GTTA 
.TGNGGTTANG 
GTTCCACCTT 



GTCTTCTTAG 
GCACCCACAT 
TCCTAATTCG 
CTGCTGGGCT 
TGTTGTGTTG 
TTGGGTTGAT 
AATGGCCTAT 
GGAAGACCAT 
GAATACTTG-r 
AAGTGGAACA 
AAGATAAGAG 
TATAGAGTAT 
TTTTAGTAGA 
ATGGATTGGA 
TAATGCTAAA 
TGGTTTTCTT 
ATGGAAGTAA 
AGTGAAAAGT 
CACTTCCGCG 
ATAGTTCATC 
ACCAGTACTA 
TAAGAGGGTC 
GATAGAAGAT 
GAAGCTTGAT 
TTGATCCTAA 
GAAACTTGGT 
CTAGGTTGGG 
TGGCAATCTA 
TTCTTCATTT 
TGAGATAAAA 
AAATAATGTG 
CAATTTATGA 
AANGGAAl^'GG 



/.^Caaaagt^ 
caaagccaga 

AATCGGGCTC 
CAGGCTAGCT 
ACCCTTCACC 
GAGTTTGAGA 
GACTTGTGTG 
ATGGTAGAGT 
GTTGAGAACA 
TGAAGAATTT 
GTAGTTTGGT 
GACAGGCCAA 
AGTTGATAAT 
GGGATTCTCA 
GAACCAACAT 
GGAAGAGACT 
AAAGAAGGAT 
GTTATTGAGG 
GATTCCTCAA 
CATCAACAGG 
GCTGAAGCCA 
TCAACTTTAC 
GTGTTAGAGA 
TTGAAAGGGA 
ATACTGGCAA 
TTACTCAGCT 
ATATTGTTAC 
GCAATAGAGA 
TAATTTCTTT 
ACTACATACT 
TGTGTTGNAT 
ACTGNNCGCT 
NCCGGCCATT 



AGT;yAACTTT 
GTCTTTAAAT 
AATTCGACGT 
GAACAAGTTT 
ACTCTCCATG 
ATTTAGGACT 
CATATAAATG 
TAGTAGAAAG 
GAGTGAAGTT 
GAGTCTTCAA 
TGTGGATGCA 
GAAACCATGG 
CATCCATTTG 
TTTGGGTAAT 
TCTTGTATGC 
TCTTTGGTGA 
GGTGGCAAGA 
AAGAGAAATG 
AATGTTATGG 
GTACATGGTG 
TCGTCGAGGG 
CATAGAGTTT 
ATGTTATTCA 
CTAGGAGATT 
GGGTTCCTTT 
TGTGTCTTTT 
AAAATGTCCT 
GCCTTTGAAA 
GATTATTTTC 
GTCGACAAAT 
GTTANGCCTA 
CTGTTCGCTT 



3 5f 
200 
25C 
30C 
35C 
<0C 
450 
500 
5-50 
600 
650 
700 
750 

eoo 

eso 

90G 
550 
1 000 
3 05C 
HOC 
1150 
1200 
1250 
1300 
1350 
1400 
1450 
1500 
1550 
1600 
1650 
1700 
1740 



(2; 



INFORMATION FOR SEQ ID NO : 1 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1666 * 

TYPE: nucleic acid 



(B) 
<C) 
(D) 



STRANDEDNESS : double 



TOPOLOGY : 



linear 







(xi ) 


SEQUENCE 


DESCRIPTION; 


SEQ ID 


NO: 


13 : 








ATG 


GAA 


GCT 


CTT 


CTC 


AAG 


CCT 




CCA 


TCT 


CTT 


TTA 


CTT 


TCC 


TCT 


45 


Met 


G3u 


Ala 


Leu 


Leu 


Lys 


Pro 


Phe 


Pro 


Ser 


Leu 


Leu 


Leu 


Ser 


Ser 












5 










10 










15 




CCT 


ACA 


CCC cAt 


AGG 


TCT 


ATT 


TTC 


CAA 


CAA 


AAT 


CCC 


TCT 


TTT 


CTA 


90 


Pre 


Thr 


Pro 


His 


Arg 


Ser 


He 


Phe 


Gin 


Gin 


Asn 


Pro 


Ser 


Phe 


Leu 












20 










25 










30 




AGT 


ccc 


ACC 


ACC 


AAA 


AAA 


AAA 


TCA 


AGA 


AAA 


TGT 


CTT 


CTT 


AGA 


AAC 


135 


Ser 


Fro 


Thr 


Thr 


Lys 


Lys 


Lys 


Ser 


Arc 


Lys 


Cys 


Leu 


Leu 


Arg 


Asn 












35 










40 










45 




AAA 


AGT 


AGT 


AAA 


CTT 


TTT 


TGT 


AGC 


TTT 


CTT 


GAT 


TTA 


GCA 


CCC 


ACA 


180 


Lys 


£er 


Ser 


Lys 


Leu 


Phe 


Cys 


Ser 


Phe 


Leu 


Asp 


Leu 


Ala 


Pro 


Thr 












50 










55 










60 




TCA 


AAG 


CCA 


GAG 


TCT 


TTA 


GAT 


GTT 


AAC 


ATC 


TCA 


TGG 


GTT 


GAT 


CCT 


225 


Ser 


Lys 


Pro 


Glu 


Ser 


Leu 


Asp 


Val 


Asn 


He 


Ser 


Trp 


Val 


Asp 


Pro 












65 










70 










75 




AAT 


TCG 


AAT 


CGG 


GCT 


CAA 


TTC 


GAC 


GTG 


ATC 


ATT 


ATC 


GGA 


GCT 


GGC 


270 


Asn 


Ser 


Asn 


Arg 


Ala 


Gin 


Phe 


Asp 


Val 


He 


He 


He 


Gly 


Ala 


Gly 












80 










85 










90 




CCT 


GCT 


GGG 


CTC 


AGG 


CTA 


GCT 


GAA 


CAA 


GTT 


TCT 


AAA 


TAT 


GGT 


ATT 


315 


Pro 


Ala 


Gly 


Leu 


Arg 


Leu 


Ala 


Glu 


Gin 


Val 


Ser 


Lys 


Tyr 


Gly 


He 












95 










100 










105 




AAG 


GTA 


TGT 


TGT 


GTT 


GAC 


CCT 


TCA 


CCA 


CTC 


TCC 


ATG 


TGG 


CCA 


AAT 


360 


Lys 


Vol 


Cys 


Cys 


Val 


Asp 


Pro 


Ser 


Pro 


Leu 


Ser 


Met 


Trp 


Pro 


Asn 












110 










115 










120 




AAT 


TAT 


GGT 


GTT 


TGG 


GTT 


GAT 


GAG 


TTT 


GAG 


AAT 


TTA 


GGA 


CTG 


GAA 


405 


Asn 


Tyr 


Gly 


Val 


Trp 


val 


Asp 


Glu 


Phe 


Glu 


Asn 


Leu 


Gly 


Leu 


Glu 












125 










130 










135 




AAT 


TGT 


TTA 


GAT 


CAT 


AAA 


TGG 


CCT 


ATG 


ACT 


TGT 


GTG 


CAT 


ATA 


AAT 


450 


Asn 


Cys 


Leu 


Asp 


His 


Lys 


Trp 


Pro 


Met 


Thr 


Cys 


Val 


His 


He 


Asn 












14 0 










145 










150 




GAT 


AAC 


AAA 


ACT 


AAG 


TAT 


TTG 


GGA 


AGA 


CCA 


TAT 


GGT 


AGA 


GTT 


AGT 


495 



wo 00/08920 




PCT/US99/1 832-7 



Asp 


As 31 


Lys 


Thr 


Lys 


Tyr 


Leu 


Gl y 


A:-c 


Pre 


Tyr 


Gly 


Arg 


Val 


Ser 












155 










160 










16 5. 




AGA 


AAG 


AAG 


CTG 


AAG 


TTG 


AAA 


TTG 


TTG 


AAT 


AGT 


TGT 


GTT 


GAG 


AAC 


540 


Arg 


Lys* 


Lys 


Leu 


Lys 


Leu 


Lys 


Leu 


Leu 


Asn 


Ser 


Cys 


Val 


Glu 


Asn 












170 










1 75 










180 




AGA 


GTG 


AAG 


TTT 


TAT 


AAA 


GCT 


AAG 


GTT 


TGG 


AAA* 


GTG 


GAA 


CAT 


GAA 


585 


Arc 


Vaj 


Lys 


Phe 


Tyr 


Lys 


Aia 


Lys 


Va J 


Trp 


Lys 


Val 


Glu 


His 


Glu 












185 










3 50 










1 95 




GAA 


TTT 


GAG 


TCT 


TCA 


ATT 


GTT 


TGT 


GAT 


GAT 


GGT 


AAG 


AAG 


ATA 


AGA 


63C 


Glu 


Phe 


Glu 


Ser 


Ser 


He 


Val 


Cys 


Asp 


Asp 


Gly 


Lys 


Lys 


He 


Arg 












200 










205 










210 




GGT 


AGT 


TTG 


GTT 


GTG 


GAT 


GCA 


AGT 


GGT 


TTT 


GCT 


AGT 


GAT 


TTT 


ATA 


675 


Gly 


Ser 


Leu 


Val 


Val 


Asp 


Ala 


Ser 


Gly 


Phe 


Ala 


Ser 


Asp 


Phe 


He 












215 










220 










225 




GAG 


TAT 


GAC 


AGG 


CCA 


AGA 


AAC 


CAT 


GGT 


TAT 


CAA 


ATT 


GCT 


CAT 


GGG 


720 


Glu 


Tyr 


Asp 


Arc 


Pro 


■Arg 


Asn 


Hi £ 


Gly 


Tyr < 


iGln 


He 


Ala 


His 


Gly 












230 










235 










24 0 




GTT 


TTA. 


GTA pAA 


GTT 


GAT 


AAT 


CAT 


CCA 


TTT 


GAT 


TTG 


GAT 


AAA 


ATG 


765 


Val 


Leu 


Val 


Glu 


Val 


Asp 


Asn 


His 


Pro 


Phe 


Asp 


Leu 


Asp 


Lys 


Met 












245 










250 










255 




GTG 


CTT 


ATG 


GAT 


TGG 


AGG 


GAT 


TCT 


CAT 


TTG 


GGT 


AAT 


GAG 


CCA 


TAT 


810 


Val 


Leu 


Met 


Asp 


Trp 


Arg 


'Asp 


Ser 


His 


Leu 


Gl>^ Asn 


Glii 


Pro 


Tyr 












260 










265 










270 




TTA 


AGG 


■GTG 


AAT 


AAT 


GCT 


AAA 


GAA 


CCA 


ACA 


TTC 


TTG 


TAT 


GCA 


ATG 


855 


Leu 


Arc 


Val 


Asn 


Asn 


Ala 


Lys 


Glu 


Pro 


Thr 


Phe 


Leu 


Tyr 


A:ia 


Met 












275 










2 60 










285 




CCA 


TTT 


GAT 


AGA 


GAT 


TTG 


GTT 


TTC 


TTG 


GAA 


GAG 


ACT 


TCT 


TTG 


GTG- 


900 


Pro 


Phe 


Asp 


Arg 


Asp 


Leu 


Vaj 


Phe 


Leu 


Glu 


Glu 


Thr 


Ser 


Leu 


Val 












290 










29S 










300 




AGT 


CGT 


CCT 


GTT 


TTA 


TCG 


TAT 


ATG 


GAA 


GTA 


AAA 


AGA 


AGG 


ATG 


GTG 


94 5 


Ser 


Arg 


Pro, Val 


j:*ei5 


Ser 


Tyr 


Met 


Glu 


Val 


Lys 


Arg 


Arg 


Met 


Val 












305 










310 










315 




CCA 


AGA 


TTA 


AGG 


CAT 


TTG 


GGG 


ATC 


AAA 


GTG 


AAA 


AGT 


GTT 


ATT 


GAG 


990 


Ala 


Arg 


Leu 


Arc 


His 


Leu 


Gly 


He 


Lys 


Val 


Lys 


Ser 


Val 


He 


Glu 












320 










325 










330 




GAA 


GAG 


AAA 


TGT 


GTG 


ATC 


CCT 


ATG 


GGA 


GGA 


CCA 


CTT 


CCG 


CGG 


ATT 


1035 


Glu 


Glu 


Lys 


Cys 


Val 


He 


Pro 


Met 


Gly 


Gly 


Pro 


•Leu 


Pro. Arg 


He 












335 










340 










345 




CCT 


CAA 


AAT 


GTT 


ATG 


GCT 


ATT 


GGT 


GGG 


AAT 


TCA 


GGG 


ATA 


GTT 


CAT 


1080 


Pro 


Gin 


Asn 


Val 


Met 


Ala 


He 


Gly 


Gly 


Asn 


Ser 


Gly 


He 


Val 


His 












350 










355 










36Q 




CCA 


TCA 


ACA 


GGG 


TAC 


ATG 


GTG 


GCT 


AGG 


AGC 


ATG 


GCT 


TTA 


GCA 


CCA 


1125 


Pro 


Ser 


Thr 


Gly 


Tyr 


Met 


Val 


Ala 


Arg 


Ser 


Met 


Ala 


Leu 


Ala 


Pro 












365 










370 










375 




GTA 


CTA 


GCT 


GAA 


GCC 


ATC 


GTC 


GAG 


GGG 


CTT 


GGC 


TCA 


ACA 


AGA 


ATG 


1170 


Val 


Leu 


Ala 


Plu 


Ala 


He 


Val 


Glu 


Gly 


Leu 


Gly 


Ser 


Thr 


Arg 


Met 












380 










385 










390 




ATA 


AGA 


GGG 


TCT 


CAA 


CTT 


TAC 


CAT 


AGA 


GTT 


TGG 


AAT 


GGT 


TTG 


TGG 


1215 


:ie 


Arc 


Gly 


Ser 


Gin 


Leu 


Tyr 


His 


Arg 


Val 


Trp 


Asn 


Gly 


Leu 


Trp 












395 










400 










405 




CCT 


TTG 


GAT 


AGA 


AGA 


TGT 


GTT 


AGA 


GAA 


TGT 


TAT 


TCA 


TTT 


GGG 


ATG 


1260 


Pro 


Leu 


Asp 


Arg 


Arg 


Cys 


Val 


Arg 


Glu 


Cys 


Tyr 


Ser 


Phe 


Gly 


Met 












410 










4 15 










420 




GAG 


ACA 


TTG 


TTG 


AAG 


CTT 


GAT 


TTG 


AAA 


GGG 


ACT 


AGG 


AGA 


TTG 


TTT 


1305 


Glu 


Thr 


Leu 


Leu 


Lys 


Leu 


Asp 


Leu 


Lys 


Gly 


Thr 


Arg 


Arg 


Leu 


Phe 












425 










430 










435 




GAC 


GCT 


TTC 


TTT 


GAT 


CTT 


GAT 


CCT 


AAA 


TAC 


TGG 


CAA 


GGG 


TTC 


CTT 


1350 


Asp 


Ala 


Phe 


Phe 


Asp 


Leu 


Asp 


Pro 


Lys 


Tyr 


Trp 


Gin 


Gly 


Phe 


Leu 












440 










445 










450 




TCT 


TCA 


AGA 


TTG 


TCT 


GTC 


AAA 


GAA 


CTT 


GGT 


TTA 


CTC 


AGC 


TTG 


TGT 


1395 


Ser 


Ser 


Arg 


Leu 


Ser 


Val 


Lys 


Glu 


Leu 


Gly 


Leu 


Leu 


Ser 


Leu 


Cys 












455 










460 










465 




CTT 


TTC 


GGA 


CAT 


GGC 


TCA 


AAC 


ATG 


ACT 


AGG 


TTG 


GAT 


ATT 


GTT 


ACA 


1440 


Leu 


Phe 


Gly 


His 


Gly 


Ser 


Asn 


Met 


Thr 


Arg 


Leu 


Asp 


He 


Val 


Thr 












470 










475 










460 




AAA 


TGT 


CCT 


CTT 


CCT 


TTG 


GTT 


AGA 


CTG 


ATT 


GGC 


AAT 


CTA 


GCA 


ATA 


1465 


Lys 


Cys 


Pro 


Leu 


Pro 


Leu 


Val 


Arc 


Leu 


He 


Gly 


Asn 


Leu 


Ala 


He 












485 










490 










4 95 




GAG 


AGC 


CTT 


TGA 


ATG 


TGA 


AAA 


GTT 


TGA 


ATC 


ATT 


TTC 


TTC 


ATT 


TTA 


1530 


Giu 


Ser 


Leu 
































4 98 




























ATT 


TCT 


TTG 


ATT 


ATT 


TTC 


ATA 


TTT 


TCT 


CAA 


TTG 


CAA 


AAG 


TGA 


GAT 


1 575 


AAG 


AGC 


TAC 


ATA 


CTG 


TCA 


ACA 


AAT 


AAA 


CTA 


CTA 


TTG 


GAA 


AGT 


TAA 


162 0 


Ah7 


ATG 


TGT 


TTG 


TTG 


TAT 


GTT 


ATT 


CTA 


ATG 


GAA 


TGG 


ATT 


TTG 


TAA 


1665 




PCT/US99/18327 



(2) .'INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2876 ^ *^ 

(E) TYPE: nucleic acid « 

(C; STRANDEDNESS : double 

(D) TCFOLiOGY : linear 







1 xi ) 


SEQUENCE 


DESCRIPTION: 


SEQ ID 


NO : 


14 : 








Q 


AAT 


TCT 


CTG 


AAA 


AGG 


AGC 


ACC 


ATA 


TTT 


GCC 


GCA 


CTG 


TGG 


TTC 


4 3 


ATA 




CCA 


AGT 


ACA 


TTT 


AG A 


TGA 


ACT 


ATA 


TCA 


TCA 


GAT 


TGA 


AAG 


88 


GTT 


ATT 


GTA 


TAA 


TCA 


ATC 


CAG 


TGG 


ATT 


CTC 


GTT 


CTG 


GCA 


CCT 


TTA 


133 


GAA 


GTA 


CAT 


GTG 


CGG 


AAA 


AGA 


ATG 


ATA 


AGG 


TTT 


GTA 


TTG 


TTG 


TTG 


1 78 








GTT 


GCC 




CTC 


ATT 


TGT 


AAA 


TGT 


TCT 


GAA 


CGA 


CTC 


223 




AAT 


TAC 


TCT 


TAA 


GGT 


GTA 


AGG 


TCT 


TCC 


GTG 


CCT 


GTT 


TGT 


AAA 


268 


TAT 


AAT 


cr"r 

J 


GTG 


CCG 


TGA 


CTT 


ACC 


TTT 


TGT 


ACC 


ATT 


TGT 


TCA 


AAT 


313 


GTA. 


TGG 


CCT 


GAA 


CAC 


CAG 


GC^ 


TGT 


CAA 


AAA 


TGT 


CTC 


ATG 


CCC 


GTT 


3 56 


TTA 


TTG 


GTC 


TGA 


AAA 


TGG 


CGT 


GAT 


GCC 


AAA 


TTC 


TGC 


CGC 


TCC 


ACA 


4 03 


GTG 


AGC 


ATT 


TCG 


ATC 


TAC 


TGG 


AAA 


TTG 


ACC 


AAC 


TTA 


TTT 


TAT 


CAC 


448 


TTG 


ATA 


ACT 


AAA 


CAA 


MM - 


CC*^ 


ATT 


AAC 




AAT 


CAT 


ACA 


TTG 


TAT 


493 


TTA 


TAC 


CGA 


AAA 


ATT 


TAT 


GCA 


TAA 


CTC 


ATT 


AAA 


TTA 


CCT 




TTA 


538 


GCA 


GTC 


AAA 


TTC 


TAA 


ATC 


AGT 


TTC 


TAA 


TTT 


ATC 


AAA 


ATG 


GCT 


TTT 


583 


ATA 


GGG 


TCC 




i i >>. 


v»MV. 


i MM 


TAT 


ACC 


TGC 


CGT 


CCA 


TGC 


ACT 


GAC 


628 


TAG 


AAA 


ACA 


AAT 


Ml. 5.^ 


J L.M 


v.. 1 M 


TGT 


TTG 




GTG 




GGT 


AAT 


ATA 


6 73 


AAA 


CCT 


TTT 


CTT 


TTA 


TGA 


*jMM 


AGT 


i I.M 


ceo. 


AGA 


ATA 


ATT 


TTC 


TAT 


718 


TTG 


TGG 


CAT 


MA i 


MVs J 


M J M 


T^'G 


TGC 


AGA 




ACA 


AGA 


ATT 


X MM 


TTT 


/ O J 


TGC 


AGT 


TGG 


GCA 


CAT 


GAA 


CAA 


TTT 


TCC 


TCA 


AAG 


TTG 


TAG 


AAA 


GTA 


808 


CTT 


1 1 \- 


ATT 


i i L. 


J J u 


J K.r\ 


CCG 


AAA 


ATT 


ATT 


TAT 


AAT 


TGA 


AAT 


TAA 


853 


AAC 


CGA 


ATG 


AGC 


TGC 


AAG 


Mil 


CAA 




GAA 


TTT 


TCA 


AAA 


GAA 


TTG 


898 


ACC 


AAG 


AAA 


AAA 


TTC 


AAA 


AAT 


ATC 


CCC 


CAC 


CCC 


CTA 


CCA 


AAC 


ACA 


94 3 


TCC 


TAA 


1 


GAG 


GTA 


i Mio 


ML, J 




Ml_ i 


GGG 


ATT 


GGG 


AAA 


AGG 


GTA 


988 


AAA 


TGC 


TTT 




TAG 


CTT 


AGC 


AAA 


GAT 


TCC 


ACT 


TTr* 


TTA 


CCT 


M J V,. 


X\3 ^ 


TTT 


CTT 


TCT 


CAT 


TTC 


r"TT 


TTT 


TCT 




TCT 






TGT 


TAT 


ATA 


1078 


AGC 


CAA 


AGT 


AGG 


TAC 


CCA 


AAA 


GCA 


TCA 


M 1 M 


TTT 


TGT 


ATT 


GCT 


TGG 


1123 


TGA 


TTC 


CTC 


TGT 


AGT 


CCA 


La J M 


TTT 


^M i 




CTA 


Zk 

L.MM 


CTT 


L. V-M 




J. J. D o 


CCC 


TCC 


ATA 


ATT 


AAC 


v_M 1 


i M 1 


CAA 


TCT 


TAT 


ACA 


TTC 


TCT 


ATA 


ATG 
Met 




GAA 


ACT 


CTT 


CTC 


AAG 


CCT 


TTT 


CCA 


TCT 


CTT 


TTA 


CTT 




Tr"r 


CCT 


J. ^ 3 D 


Glu 


Thr 


Leu 


Leu 


Lys 


Pro 


Phe 


Pro 


Ser 


1 0 


Leu 


Leu 


Ser 


Ser 


1 S 




ACA 


CCC 


CAT 




i ^ i 


ATT 


TTC 


CAA 


CAA 


AAT 


CCC 


TCT 


TTT 


CTA 


AGT 


13 03 


Thr 


Pro 


Hi. s 


Arg 


e X 


lie 


Phe 


Gin 


Gin 




Pro 


Ser 


Phe 


Leu 






















2 5 










30 




CCC 


ACC 


ACC 


AAA 


AAA 


AAA 


i V.M 


AGA 


AAA 


TGT 


CTT 


CTT 


AGA 


AAC 


AAA 


134 8 




Thr 


Thr 


Lys 


Lys 


Lys 


Ser 


Arg 


Lys 


Cvs 


Leu 


eu 


Arc 


Asn 














3 5 










^ \i 










4 5 




ACT 


AGT 


AAA 


CTT 


TTT 


TGT 


MvjV- 


TTT 
i i X 


CTT 


vjm i 


TTA 


vjk,M 


CCC 


Mv.M 


Tr'A 

i \.M 




Ser 


S^r 


Lys 


Leu 


Phe 


Cys 


Ser 


Phe 


Leu 


Asp 


Leu 


Al a 


Pro 


TViy- 

i nr 


Ser- 










5 0 




















60 




AAG 


CCA 


GAG 


TCT 


TTA 


GAT 


i J 


MMv. 


M J V. 


Tf a 
1 l-M 


TGG 


rsTT 


OM J 




31 HT 
MM X 


1 il ^ R 


Iiys 


Pro 


Glu 


kier 


jjeu 


Asp 


Va 1 


Asn 


lie 


Ser 


Trp 


Val 


Asp 


Pro 












6 S 










7 0 










75 




TCG 


AAT 


CGG 


GCT 


CAA 


TTC 


VjjMV^ 




A TO 
Mi ^ 


Mil 


ATC 


r:^^2k 

ooM 




CCC 




J. % O J 


Ser 


Asn 


Arg 


Ml a 


o J. n 


Phe 


Asp 


Val 


lie 


lie 


lie 


Gly 


Al a 


ly 














6 0 










o o 










9 0 




GCT 


GGG 


CTC 


AGG 


CTA 


GCT 


GAA 


CAA 


GTT 


TCT 


AAA 


TKT 
i M i 


CCT 


M J J 




1 <i O ft 
J. 3 ^ O 


Ala 


Gly 


Leu 


Arg 


Leu 


A3 a 


Glu 


Gin 


Val 


Ser 


Lys 


Tyr 


Gly 


I}e 


Lys 












95 










100 










105 




GTA 


TGT 


TGT 


GTT 


GAC 


CCT 


TCA 


CCA 


CTC 


TCC 


ATG 


TGG 


CCA 


AAT 


AAT 


1573 


Val 


Cys 


Cys 


val 


Asp 


Pro 


Ser 


Pro 


Leu 


Ser 


Met 


Trp 


Pro 


Asn 


Asn 












lie 










115 










120 




TAT 


GGT 


GTT 


TGG 


GTT 


GAT 


GAG 


TTT 


GAG 


AAT 


TTA 


GGA 


CTG 


GAA 


AAT 


1618 


Tyr 


Gly 


Val 


Trp 


Val 


Asp 


Glu 


Phe 


Glu 


Asn 


Leu 


Gly 


Leu 


Glu 


Asn 












12S 










130 










135 




TGT 


TTA 


GAT 


CAT 


/lAA 


TGG 


CCT 


ATG 


ACT 


TGT 


GTG 


CAT 


ATA 


AAT 


GAT 


1663 


Cys 


Leu 


Asp 


His 


Lys 


Trp 


Pro 


Met 


Thr 


Cys 


Val 


His 


lie 


Asn 


Asp 












140 










145 










150 




AAC 


AAA 


ACT 


AAG 


TAT 


TTG 


GGA 


AGA 


CCA 


TAT 


GGT 


AGA 


GTT 


AGT 


AGA 


1708 


Asn 


Lys 


Thr 


Lys 


Tyr 


Leu 


GDy 


Arg 


Pro 


Tyr 


Gly 


Arc 


Val 


Ser 


Arg 












155 










160 










165 




AAG 


AAG 


CTG 


AAG 


TTG 


AAA 


TTG 


TTG 


AAT 


AGT 


TGT 


GTT 


GAG 


AAC 


AGA 


1753 


Lys 


Lys 


Leu 


Lys 


Leu 


Lys 


Leu 


Leu 


Asn 


Ser 


Cys 


Val 


Glu 


Asn 


Arg 










170 










175 










180 





V^O 00/08920 PCT/US99/) 8327 



GTG 


AAG 


TTT 


TAT 


AAA 


GCT 


AAG 


GTT 


TGG 


AAA 


GTG 


GAA 


CAT 


GAA 


GAA 


1 796 


Va3 


Lys 


Phe 


Tyr 


LyG 


A j a 


LVG 


VsJ 


Trp 


Lys 


Val 


Glu 


Hi s 


Glu 


Glu 












1 8£ 










190 










1 95 




TTT 


GAG 


TCT 


TCA 


ATT 


GTT 


TGT 


GAT 


GAT 


GGT 


AAG 


AAG 


ATA 


AGA 


GGT 


1843 


Phe 


Glu 


•Ser 


Ser 


lie 


Vol 


Cys 


Asp 


Asp 


Gly 


Lys 


Lys 


He 


Arc 


Gly 












200 










205 










210 




AGT 


TTG 


GTT 


GTG 


GAT 


OCA 


AGT 


GGT 


TTT 


GCT 


AGT 


GAT 


TTT 


ATA 


GAG 


1888 


Ser 


Leu 


Val 


Val 


Asp 


Al s 


Ser 


Gly 


Phe 


Ala 


Ser 


Asp 


Phe 


He 


Giu 












21 S 










220 










225 




TAT 


GAC 


AGG 


CCA 


AGA 


AAC 


CAT 


GGT 


TAT 


CAA 


ATT 


GCT 


CAT 


GGG 


GTT 


1933 


Tyr 


Asp 


Arg 


Pro 


Arg 


Asn 


His 


Gly 


Tyr 


Gin 


He 


Ala 


His 


Gly 


Val 












230 










235 










24C 




TTA 


GTA 


GAA 


GTT 


GAT 


AAT 


CAT 


CCA 


TTT 


GAT 


TTG 


GAT 


AAA 


ATG 


GTG 


1976 




Vol 


Glu 


Val 


Asp 


Asn 


His 


Pre 


Phe 


Asp 


Leu 


Asp 


Lys 


Met 


Val 












245 










250 










255 




CTT 


ATG 


GAT 


TGG 


AGG 


GAT 


TCT 


CAT 


TTG 


GGT 


AAT 


GAG 


CCA 


TAT 


TTA 


2023 


Ireu 


Met 


Asp 


Trp 


Arg 


Asp 


Ser 


His 


Leu 


Gly 


Asn 


Glu 


Pro 


Tyr 


Leu 












260 










265 










270 




AGG' 


GTG 


AAT 


AAT 


GCT 


AAA 


GAA 


CCA 


ACA 


TTC 


TTG 


TAT 


GCA 


ATG 


CCA 


2068 


Arg 


Val 


Asn 


Asn 


Als 


Lys 


Glu 


Pro 


Thr 


Phe 


Leu 


Tyr 


Als 


Met 


Pro 












275 










280 










285 




TTT 


GAT 


AGA 


GAT 


TTG 


GTT 


TTC 


TTG 


GAA 


GAG 


ACT 


TCT 


TTG 


GTG 


AGT 


2113 


Phe 


Asp 


Arg 


Asp 


Leu 


Val 


Phe 


Leu 


Glu 


Glu 


Thr 


Ser 


Leu 


Val 


Ser 












290 










295 










300 




CGT 


CCT 


GTT 


TTA 


TCG 


TAT 


ATG 


GAA 


GTA 


AAA 


AGA 


AGG 


ATG 


GTG 


GCA 


2158 


Arg 


Pro 


Val 


Leu 


Ser 


Tyr 


Met 


Glu 


Val 


Lys 


Arg 


Arg 


Met 


Val 


Ala 












305 










310 










315 




AGA 


TTA 


AGG 


CAT 


TTG 


GGG 


ATC 


AAA 


GTG 


AAA 


AGT 


GTT 


ATT 


GAG 


GAA 


2203 


Arg 


Leu 


Arg 


His 


Leu 


Gly 


lie 


Lys 


Val 


Lys 


Ser 


Val 


He 


Glu 


Glu 












320 










325 










330 




GAG 


AAA 


TGT 


GTG 


ATC 


CCT 


ATG 


GGA 


GGA 


CCA 


CTT 


CCG 


CGG 


ATT 


CCT 


2248 


Glu 


Lys 


Cys 


Val 


lie 


Pro 


Met 


Gly 


Gly 


Pro 


Leu 


Pro 


Arc 


He 


Fro 












335 










340 






. 




345 




CAA 


AAT 


GTT 


ATG 


GCT 


ATT 


GGT 


GGG 


AAT 


TCA 


GGG 


ATA 


GTT 


CAT 


CCA 


2293 


Gin 


Asn 


Val 


Met 


Ala 


lie 


Gly 


Gly 


Asn 


Ser 


Gly 


He 


Val 


Ki s 


Pre 












350 










355 










360 




TCA 


ACA 


GGG 


TAC 


ATG 


GTG 


GCT 


AGG 


AGC 


ATG 


GCT 


TTA 


GCA 


CCA 


GTA 


2338 


Ser 


Thr 


Gly 


Tyr 


Met 


Val 


Ala 


Arg 


Ser 


Met 


Ala 


Leu 


Ala 


Pro 


Val 












365 










370 










375 




CTA 


GCT 


GAA 


GCC 


ATC 


GTC 


GAG 


GGG 


CTT 


GGC 


TCA 


ACA 


AGA 


ATG 


ATA 


2383 


L»eu 


Ala 


Glu 


Ala 


3le 


Val 


Glu 


Gly 


Leu 


Gly 


Ser 


Thr 


Arc 


Met 


He 












380 










385 










390 




AGA 


GGG 


TCT 


CAA 


CTT 


TAC 


CAT 


AGA 


GTT 


TGG 


AAT 


GGT 


TTG 


TGG 


CCT 


2428 


Arg 


Gly 


Ser 


Gin 


Leu 


Tyr 


His 


Arg 


Val 


Trp 


Asn 


Gly 


Leu 


Trp 


Pro 












395 










400 










405 




TTG 


GAT 


AGA 


AGA 


TGT 


GTT 


AGA 


GAA 


TGT 


TAT 


TCA 


TTT 


GGG 


ATG 


GAG 


2473 


Leu 


Asp 


Arg 


Arg 


Cys 


Val 


Arg 


Glu 


Cys 


Tyr 


Ser 


Phe 


Gly 


Met 


Glu 












410 










415 










420 




ACA 


TTG 


TTG 


AAG 


CTT 


GAT 


TTG 


AAA 


GGG 


ACT 


AGG 


AGA 


TTG 


TTT 


GAC 


2518 


Thr 


Leu 


Leu 


Lys 


Leu 


Asp 


Leu 


Lys 


Gly 


Thr 


Arg 


Arg 


Leu 


Phe 


Asp 












425 










430 










435 




GCT 


TTC 


TTT 


GAT 


CTT 


GAT 


CCT 


AAA 


TAC 


TGG 


CAA 


GGG 


TTC 


CTT 


TCT 


2563 


Ala 


Phe 


Phe 


Asp 


Leu 


Asp 


Pro 


Lys 


Tyr 


Trp 


Gin 


Gly 


Phe 


Leu 


Ser 












44 0 










445 










450 




TCA 


AGA 


TTG 


TCT 


GTC 


AAA 


GAA 


CTT 


GGT 


TTA 


CTC 


AGC 


TTG 


TGT 


CTT 


2608 


Ser 


Arg 


Leu 


Ser 


Val 


Lys 


Glu 


Leu 


Gly 


Leu 


Leu 


Ser 


Leu 


Cys 


Leu 












455 










460 










465 




TTC 


GGA 


CAT 


GGC 


TCA 


AAC 


ATG 


ACT 


AGG 


TTG 


GAT 


ATT 


GTT 


ACA 


AAA 


2653 


Phe 


Gly 


His 


Gly 


Ser 


Asn 


Met 


Thr 


Arg 


Leu 


Asp 


He 


Val 


Thr 


Lys' 












470 










475 










480 




TGT 


CCT 


CTT 


CCT 


TTG 


GTT 


AGA 


CTG 


ATT 


GGC 


AAT 


CTA 


GCA 


ATA 


GAG 


2698 


Cys 


Pro 


Leu 


Pro 


Leu 


Val 


Arg 


Leu 


He 


Gly 


Asn 


Leu 


Ala 


He 


Glu 












485 










490 










495 




AGC 


CTT 


TGA 


ATG 


TGA 


AAA 


GTT 


TGA 


ATC 


ATT 


TTC 


TTC 


ATT 


TTA 


ATT 


2743 


Ser 


Leu 
































498 






























TCT 


TTG 


ATT 


ATT 


TTC 


ATA 


TTT 


TCT 


CAA 


TTG 


CAA 


AAG 


TGA 


GAT 


AAG 


2788 


AGC 


TAC 


ATA 


CTG 


TCA 


ACA 


AAT 


AAA 


CTA 


CTA 


TTG 


GAA 


AGT 


TAA 


AAT 


2833 


ATG 


TGT 


TTG 


TTG 


TAT 


GTT 


ATT 


CTA 


ATG 


GAA 


TGG 


ATT 


TTG 


TAA 


A 


2876 



(2) 



INFORMATION FOR SEQ ID NO : 1 5 : 

(i) SEQUENCE CHARACTERISTICS: 
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<A, LENGTH: ' 3"£65 

(E) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 
ATC TCA TTG TAT AGC TTG TCT TTT GTT TCA GTC GTC TTA GGC TTG 4 5 
GGT TAG TTG GTG TTG CTG TTT CAT ACT TCT ATC AAC CTT GTG TGA 90 
GTT CCT TTA TAA AAT ATG ACT GTT GGA GGA AGT AAT TTA CCT TTA 135 
GTT CGA CTA CAT CAA GAT TTG CAT CAT TCT CGT CCA AGA AAT CTT 180 
AGT TTG AAG CCT TTT GGT CTG GTA TAT TTG TCA ATC TGA GCT TCG 22 5 
CAA CTT TCT CAT GAC AGG GGT TTG TTG ACA TGC CTG ATT GTG CTC 27 0 
TTC CTT TAC TTG ATA ATT GCT GCT TGT TGC GGA GGC ATC ACT CTA 315 
CCT TCC TGC AGA TCA TGA ATT CTC TGA AAA GGA GCA CCA TAT TTG 36 0 
CCG CAC TGT GGT TCA TAT TTC CAA TTA CAT TTA GAT GAA CTA TAT 4 05 
CAT CAG GAG TGA AAG GTT ATT GTA TAA TCA • ATC CAG TGG ATT CTC 4 50 
GTT CTG GCA CCT TTA GAA GTA CAT GTG CGG AAA AGA ATG ATA AGG 4 95 
TTT GTA" TTG TTG TTG ACA AGG CCT GTT GCC TTT CTC ATT TGT AAA 54 0 
TGT TCT GAA CGA CTC CTA AAT TAC TCT TAA AGT GTA AGG TCT TCC 585 
GTG CCT GTT • TGT ATA TAT AAT GCT GTG CCG TGA CTT ACC TTT TGT 630 
ACC ATT TGT TCA AAT GTA TGG CCT GGA CAC TAG GGT TGT CAA AAA 67 5 
TGT CTC ATG ACT TCA CCC TTC TTT CTT GTC TTG* GTG CCC GTT TTA 72 0 
TTG GTC TGA GAA CGG CGT GAT GCC AAA TTC TGC CGC TCC ACA GTG 76 5 
AGC ATT TCG ATC TAC TGG AAA TTG ACC AAC TTA TTT TJ^T CAC TTG 810 
ATA ACT AGA GTC TGG GTT CAA ACA AAA TCC AAT AAC TTC AAT CAT 85 5 
ACA TTG TAT TTA TAT TGA AAA AAT TAT GCA CAA CTC AGT AAA TTA 900 
CCT TTT TTT GCA GTC liAlK AAT TCT AGA TCA GTT TCT AAT TAA TCA- 94 5 
AAA TGG CCT TTA TAG GGT CCC AGT TCC ATT AAT ATA CCT GCC GTC 99 0 
CAT GCA CTG ATT ACA AGA CAA ATA CCT CAC TAT GTT TGT TAG TGC 1035 
TTG GTA ATA TAA AAC CTT TTC TTT TAT GAG AAA . GTT CAC CGA AAA 108 0 
TAA TTT TCT , ATT TCt GGC ATA ACT AGT ATC GAA GTA TAT AGT GCA 112 5 
GAT TGA CAA. GAA TTT AAT TTT GCA GTT GGG CAC ATG AAC AAT TTT 1170 
CCT CAA AGT TGT AGA AAA TAT TTT TCA TTT TCT TGT CAC CGA AAA 1215 
TTA TTT ATA ATT GAA ATT GAA ACC GAA TGA GCT GCA AGA CTC GAG 126 0 
TCG AAT TTC AAA AAA ATT GAC CAA CTA AAT ATG AAA AAA TCC GAA 1305 
TAT ATC CCC CAC CCC CTA CCA AAC ACA TCC TAA AGT GAG GTA TAG 13 50 
ACT GGG ACT GGG ATT GGG AAA AGG GTA AAA TGC TTT CAC. TAG CTT 13 95 
AGC AAA GAT TCC ACT TTG TTA GCT ATC TTT CTT TCT CAT TTC CTT 14 4 0 
TTT TCT TTT TCT TTT TTT TGT TAT ATA AGC CAA AGT AGG TAC CCA 14 8 5 
AAA GCA TCA ATA TTT TGT ATT GCT TGG TGA TTC CTC TTT ACT CCA 1530 
GTA TTT CAT TTT CTA CAA GTT CCA CCT CCC TCC ATA ATT AAC CAT 15 75 
TAT CAA TCT TAT ACA TTT TCT ATA ATG GAA ACT CTT CTC T^G CCT 1620 

Met Glu Thr Leu Leu Lys Pro' 
5 

TTT CCA TCT CTT TTA CTT TCC TCT CCT ACA CCC TAT AGG TCT ATT 166 5 
Phe Pro Ser Le-u Leu Leu Ser Ser Pro Thr Pro Tyr Arg Ser lie 

1-0 ' 15 20 

GTC CAA CAA AAT CCT TCT TTT CTA AGT CCC ACC ACC CAA AAA AAA 1710 
Val Gin Gin Asn Pro Ser Phe Leu Ser Pro Thr Thr Gin Lys Lys 

25 30 35 

TCA AGA AAA TGT CTT CTT AGA AAC MA AGT AGT AAA CTT TTT TGT 17 55 
Ser Arg Lys Cys Leu Leu Arg Asn Lys Ser Ser Lys Leu Phe Cys 

40 45 50 

AGC TTT CTT GAT TTA GCA CCC ACA TCA AAG CCA GAG TCT TTA AAT 18 00 
ser Phe Leu Asp Leu Ala Pro Thr Ser Lys Pro Glu Ser Leu Asn 

55 60 65 

GTT AAC ATC TCA TGG GTT GAT CCT AAT TCT GGT CGG GCT CAA TTC 184 5 
Val Asn lie Ser Trp Val Asp Pro Asn Ser Gly Arg Ala Gin Phe 

70 75 80 

GAC GTG ATC ATT ATC GGA GCT GGC CCT GCT GGG CTC AGG TTA GCT 18 90 
Asp Val He He lie Gly Ala Gly Pro Ala Gly Leu Arg Leu Ala 

85 90 95 

GAA CAA GTT TCT AAA TAT GGT ATT AAG GTA TGT TGT GTT GAC CCT 193 5 
Glu Gin Val Ser Lys Tyr Gly He Lys Val Cys Cys Val Asp Pro 

100 105 110 

TCA CCA CTC TCC ATG TGG CCA AAT AAT TAT GGT GTT TGG GTT GAT 1980 

Ser Pro Leu Ser Met Trp Pro Asn Asn Tyr Gly Val Trp Val Asp 

115 120 125 

GAG TTT GAG AAT TTA GGA CTG GAA GAT TGT TTA GAT CAT AAA TGG 2025 

Glu Phe Glu Asn Leu Gly Leu Glu Asp Cys Leu Asp His Lys Trp 

130 135 140 

CCT ATG ACT TGT GTG CAT ATA AAT GAT AAC AAG ACT AAG TAT TTG 2 07 0 



\VU 00/08920 
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Pro 


Met 


Thr 


Cys 


Va 1 


His 


lie 


7\sn 


Asp 




Ly £ 




hys 


Tyr 


Leu 








14b 










2 SO 










i 








GGA 


AGA 


CCA 


TAT 


GGT 


AGA 


GTT 


AGT 


AGA 


AAG 


AAG 


CTG 


AAG 


TTG 


AAA 


2115 


Gly 


Arg ' 


' Pro 


Tyr 


Gj y 


Arg 


Val 


Ser 


Arg 


Lys 


Lys 


Leu 


Lys 


Leu 


Lys 








160 










165 










170 








TTG 


TTG 


AAC 


AGT 


TGT 


GTT 


GAG 


AAC 


AGA 


GTG 


AAG 


TTT 


TAT 


AAA 


GCT 


2160 


Leu 


Leu 


Asn 


Ser 


• Cys 


Val 


Glu 


Asn 


Arg 


Val 


Lys 


Phe 


Tyr 


Lys 


Ala 








175 










1 80 










185 








AAG 


GTT 


TGG 


AAA 


GTG 


GAA 


CAT 


GAA 


GAA 


TTT 


GAG 


TCT 


TCA 


ATT 


GTT 


2205 


Lys 


Val 


Trp 


Lys 


Val 


Glu 


His 


Glu 


Glu 


Phe 


Glu 


Ser 


Ser 


He 


Val 








190 










1 95 










200 








TGT 


GAT 


GAT 


GGT 


AAG 


AAG 


ATA 


AGA 


GGT 


AGT 


TTG 


GTT 


GTG 


GAT 


GCA 


2250 


Cys 


Asp 


Asp 


Gly 


Lys 


Lys 


He 


Arg 


Gly 


Ser 


Leu 


Va3 


Val 


Asp 


Ala 








205 










210 










215 








AGT 


GGT 


TTT 


GCT 


AGT 


GAT 


TTT 


ATA 


GAG 


TAT 


GAC 


AAG 


CCA 


AGA 


T^C 


2295 


Ser 


Gly 


Phe 


Ala 


Ser 


Asp 


Phe 


He 


Glu 


Tyr < 


1 Asp 


Lys 


Pro 


Arg 


Asn 








220 










225 










230 








CAT 


GGT,. 


TAT . 


CAA 


ATT 


GCT 


CAT 


GGG 


GTT 


TTA 


GTA 


GAA 


GTT 


GAT 


AAT 


2340 


His 


Gly 


Tyr 


Gin 


He 


Al a 


His 


Gly 


Val 


Leu 


Vol 


Glu 


Val 


Asp 


Asn 








235 










24 0 










245 








CAT 


CCA 


TTT 


GAT 


TTG 


GAT 


AAA 


ATG 


GTG 


CTT 


ATG 


GAT 


TGG 


AGG 


GAT 


2385 


His 


Pro 


Phe 


Asp 


Leu 


Asp 


Lys 


Met 


Val 


Leu 


Met* 


Asp 


7rp 


Arg 


Asp 








250 










255 










260 








TCT 


CAT 


TTA 


GGT 


AAT 


GAG 


CCA 


TAT 


TTA 


AGG 


GTG 


AAT 


AAT 


GCT 


AAA 


24 30 


Ser 


His 


Leu 


Gly 


Asn 


Glu 


Pro 


Tyr 


Leu 


Arg 


Val 


Asn 


Asn 


Ala 


Lys 








265 










270 










275 








GAA 


CCA 


.ACA 


TTC 


TTG 


TAT 


GCA 


ATG 


CCA 


TTT 


GAT 


AGA 


AAT 


TTG 


GTT. 


2475 


Glu 


Pro 


Thr 


Phe 


Leu 


Tyr 


Ala 


Met 


Pro 


Phe 


Asp 


Arc 


Asn 


Leu 


Val 








280 










285 










2S0 








TTC 


TTG 


GAA 


GAG 


ACT 


TCT 


TTG 


GTG 


AGT 


CGT 


CCT 


GTG 


TTA 


TCG 


TAT 


2520 


Phe 


Leu 


Glu 


Glu 




Ser 


Leu 


Val 


Ser 


Arg 


Pro' 


Val 


Leu 


Ser 


Tyr 








295 










300 










305 








ATG 


GAA 


gta' 


T^AA 


AGA 


AGG 


ATG 


GTG 


GCA 


AGA 


TTA 


AGG 


CAT 


TTG 


GGG 


256 5 


Met 


Glu 


Val 


Lys 


Arg 


Arg 


Met 


Val 


Ala 


Arg 


Leu 


Arg 


His 


Leu 


Gly 








310 










315 










320 








ATC 


AAA 


gtg 


AGA 


AGT 


GTT 


ATT 


GAG 


GAA 


GAG 


AAA 


TGT 


GTG 


ATC 


CCT 


2610 


He 


Lys 


Val 


Arg 


Ser 


Val 


He 


Glu 


Glu 


Glu 


Lys 


.Cys 


Val. 


He 


Pro 








325 










330 










335* 








ATG 


GGA 


GGA 


CCA 


CTT 


CCG 


CGG 


ATT 


CCT 


CAA 


AAT 


GTT 


ATG 


GCT 


ATT 


2655 


Met 


Gly 


Gly 


Pro 


Leu 


Pro 


Arg 


He 


Pro 


Gin 


Asn 


Val 


Met 


Ala 


He 








340 










345 










350 








GGT 


GGG 


AAT 


TCA 


GGG 


ATA 


GTT 


CAT 


CCA 


TCA 


ACG 


GGG 


TAC 


ATG 


GTG 


2700 


Gly 


Gly 


Asn 


Ser 


Gly 


He 


Val 


His 


Pro 


Ser 


Thr 


Gly 


Tyr 


Met 


Val 








355 










360 










365 








GCT 


AGG 


AGC 


ATG 


GCT 


TTA 


GCA 


CCA 


GTA 


CTA 


GCT 


GAA 


GCC 


ATC 


GTC 


2 74 5 


Ala 


Arg 


Ser 


.Met 


Ala 


Leu 


Ala 


Pro 


Val 


Leu 


Ala 


Glu 


Ala 


He 


Val 








370 








375 










380 








GAG 


GGG 


CTT 


GGC 


TCA 


ACA 


AGA 


ATG 


ATA 


AGA 


GGG 


TCT 


CAA 


CTT 


TAC 


2790 


Glu 


Gly 


Leu 


Gly 


Ser 


Thr 


Arg 


Met 


He 


Arg 


Gly 


Ser 


Gin 


Leu 


Tyr 








385 










390 










395 








CAT 


AGA 


GTT 


TGG 


AAT 


GGT 


TTG 


TGG 


CCT 


TTG 


GAT 


AGA 


AGA 


TGT 


GTT 


2835 


His 


Arg 


Val 


Trp 


Asn 


Gly 


Leu 


Trp 


Pro 


Leu 


Asp 


Arg 


Arg 


Cys 


Val 








400 










405 










410 








AGA 


GAA 


TGT 


TAT 


TCA 


TTT 


GGG 


ATG 


GAG 


ACA 


TTG 


TTG 


AAG 


CTT 


GAT 


2880 


Arg 


Glu 


Cys 


Tyr 


Ser 


Phe 


Gly 


Met 


Glu 


Thr 


Leu 


Leu 


Lys 


Leu 


Asp 








415 










420 










425 








TTG 


AAA 


GGG 


ACT 


AGG 


AGA 


TTG 


TTT 


GAC 


GCT 


TTC 


TTT 


GAT 


CTT 


GAT 


2925 


Leu 


Lys 


Gly 


Thr 


Arg 


Arg 


Leu 


Phe 


Asp 


Ala 


Phe 


Phe 


Asp 


Leu 


Asp 








430 










435 










440 








CCT 


AAA 


TAC 


TGG 


CAA 


GGG 


TTC 


CTT 


TCT 


TCA 


AGA 


TTG 


TCT 


GTC 


AAA 


2970 


Pro 


Lys 


Tyr 


Trp 


Gin 


Gly 


Phe 


Leu 


Ser 


Ser 


Arc 


Leu 


Ser 


Val 


Lys 








445 










450 










455 








GAA 


CTT 


GGT 


TTA 


CTC 


AGC 


TTG 


TGT 


CTT 


TTC 


GGA 


CAT 


GGC 


TCA 


AAT 


3015 


Glu 


Leu 


Gly 


Leu 


Leu 


Ser 


Leu 


Cys 


Leu 


Phe 


Gly 


His 


Gly 


Ser 


Asn 








460 










465 










470 








TTG 


ACT 


AGG 


TTG 


GAT 


ATT 


GTT 


ACA 


AAA 


TGT 


CCT 


GTT 


CCT 


TTG 


GTT 


3060 


Leu 


Thr 


Arg 


Leu 


Asp 


He 


Val 


Thr 


Lys 


Cys 


Pro 


Val 


Pro 


Leu 


Val 








475 










480 










485 








AGA 


CTG 


ATT 


GGC 


AAT 


CTA 


GCA 


GTA 


GAG 


AGC 


CTT 


TGA 


ATG 


TGA 


T^AA 


3105 


Arg 


Leu 


He 


Gly 


Asn 


Leu 


Ala 


Val 


Glu 


Ser 


Leu 
















490 










495 






49E 












GTT 


TGA 


ATC 


ATT 


TTC 


TTT 


ATT 


TTA 


ATT 


TCT 


TTG 


ATT 


ATT 


TTC 


ATA 


3150 


TTT 


TCT 


CAA 


TGC 


AAA 


AGT 


GAG 


AGA 


AGA 


CTA 


TAC 


ACT 


GTC 


AAC 


AAA 


3195 


TAA 


ACT 


ACT 


ATT 


GGA 


AAG 


TTA 


PiPA 


TAA 


TGT 


GTG 


TGT 


TGT 


ATG 


TTA 


3240 


TGC 


TAA 


TGG 


AAT 


GGA 


TTG 


GTG 


TAA 


A 














3265 



wo { . . 




PCT/US99/1S327 



(2i . ' 3NF0Rr<AT10N FOR SEQ ID NO : 1 6 : 

ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 174 0 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
(D} TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



ATG 


GAA 


GCT 


CTT 


CTC 


AAG 


CCT 


TTT 


CCA 


TCT 


CTT 


TTA 


CTT 


TCC 


i V. i 




Met 


Gl u 


Al a 


Leu 




Lys 


Pro 


Phe 


Pro 


Ser 
1 0 


Leu 


Leu 


Leu 


Ser 


Ser 




CCT 


AC/* 


CCC 


TAT 


AGG 


TCT 


ATT 


GTC 


CAA 


CAA 


AAT 


CCT 


TCT 


TTT 


1 5 
CTA 


Q n 

y u 




"TYi r 


Pre 


Tyr 


Arg 


Ser 


lie 


Va J 


Gin 


Gin 


Asn 


Pro 


Ser 


rne 


Leu 












2 0 










■ 2 s 










3 0 




AGT 


CCC 


ACC 


ACC 


AAA 


AAA 


AAT 






AAT 


GTC 


TTC 


TTA 


GAA 


ACA 


13 5 


Ser' 


Pro 


Thr 


Thr 


Lys 

3 5 


Lys 


Asn 


u j.n 


oXU 


Asn 
4 0 


Val 


Phe 


Leu 


Glu 


Thr 




AAA 


GTA 


GTA 


AAC 


TTT 


TTT 


GTA 


GCT 


TTC 


TTG 


ATT 


TAG 


CAC 


CCA 


4 5 
CAT 


180 


Lys 


Va 1 


Vai 


Asn 


Phe 


Phe 


Vex J. 


Ai a 


Phe 


Leu 


lie 




















50 












56 












CAA 


AGC 


CAG 


AGT 


CTT 


TAA 


ATG 


TTA 


ACA 


TCT 


CAT 


GGG 


TTG 


ATC 


CTA 


22 5 


^TT 


CGA 


ATC 


GGG 


CTC 


AAT 


TCG 


ACG 


TG^ 


TCA 


TTA 


TCG 


GAG 


CTG 


GCC 


270 


CTG 


CTG 


GGC 


TCA 


GGC 


TAG 


CTG 


AAC 


AAG 


TTT 


CTA 


AAT 


ATG 


GTA 


TTA 


315 


AGG 


TAT 


GTT 


GTG 


TTG 


ACC 


CTT 


CAC 


CAC 


TCT 


CCA 


TGT 


GGC 


CAA 


ATA 


360 


ATT 


ATG 


GTG 


TTT 


GGG 


TTG 


ATG 


AGT 


TTG 


AGA 


ATT 


TAG 


GAC 


TGG 


AAA 


4 05 


ATT 


GTT 


TAG 


ATC 


ATA 


AAT 


GGC 


CTA 


TGA 


CTT 


GTG 


TGC 


ATA 


TAA 


ATG 


4 50 


ATA 


ACA 


AAA 


CTA 


AGT 


ATT 


TGG 


GAA 


GAC 


CAT 


ATG 


GTA 


GAG 


TTA 


GTA 


4 95 


GAA 


AGA 


AGC 


TGA 


AGT 


TGA 


AAT 


TGT 


TGA 


ATA 


GTT 


GTG 


TTG 


AGA 


ACA 


54 0 


GAG 


TGA 


AGT 


TTT 


ATA 


AAG 


CTA 


AGG 


TTT 


GGA 


AAG 


TGG 


AAC 


ATG 


AAG 


585 


AAT 


TTG 


AGT 


CTT 


CAA 


TTG 


TTT 


GTG 


ATG 


ATG 


GTA 


AGA 


AGA 


TAA 


GAG 


630 


GTA 


GTT 


TGG 


TTG 


TGG 


ATG 


CAA 


GTG 


GTT 


TTG 


CTA 


GTG 


ATT 


TTA 


TAG 


675 


AGT 


ATG 


ACA 


GGC 


CAA 


GAA 


ACC 


ATG 


GTT 


ATC 


AAA 


TTG 


CTC 


ATG 


GGG 


720 


TTT 


TAG 


TAG 


AAG 


TTG 


ATA 


ATC 


ATC 


CAT 


TTG 


ATT 


TGG 


ATA 


AAA 


TGG 


765 


TGC 


TTA 


TGG 


ATT 


GGA 


GGG 


ATT 


CTC 


ATT 


TGG 


GTA 


ATG 


AGC 


CAT 


ATT 


810 


TAA 


GGG 


TGA 


ATA 


ATG' 


CTA 


AAG 


AAC 


CAA 


CAT 


TCT 


TGT 


ATG 


CAA 


TGC 


65 5 


CAT 


TTG 


ATA 


GAG 


ATT 


TGG 


TTT 


TCT 


TGG 


AAG 


AGA 


CTT 


CTT 


TGG 


TGA 


90 0 


GTC 


GTC 


CTG 


TGT 


TAT 


CGT 


ATA 


TGG 


AAG 


TAA 


AAA 


GAA 


GGA 


TGG 


TGG 


94 5 


CAA 


GAT 


TAA 


GGC 


ATT 


TGG 


GGA 


TCA 


AAG 


TGA 


AAA 


GTG 


TTA 


TTG 


AGG 


990 


AAG 


AGA 


AAT 


GTG 


TGA 


TCC 


CTA 


TGG 


GAG 


GAC 


CAC 


TTC 


CGC 


GGA 


TTC 


1035 


CTC 


AAA 


ATG 


TTA 


TGG 


CTA 


TTG 


GTG 


GGA 


ATT 


CAG 


GGA 


TAG 


TTC 


ATC 


1 080 


CAT 


CAA 


CAG 


GGT 


ACA 


TGG 


TGG 


CTA 


GGA 


GCA 


TGG 


CTT 


TAG 


CAC 


CAG 


1125 


TAC 


TAG 


CTG 


AAG 


CCA 


TCG 


TCG 


AGG 


GGC 


TTG 


GCT 


CAA 


CAA 


GAA 


TGA 


1170 


TAA 


GAG 


GGT 


CTC 


AAC 


TTT 


ACC 


ATA 


GAG 


TTT 


GGA 


ATG 


GTT 


TGT 


GGC 


1215 


CTT 


TGG 


ATA 


GAA 


GAT 


GTG 


TTA 


GAG 


AAT 


GTT 


ATT 


CAT 


TTG 


GGA 


TGG 


1260 


AGA 


CAT 


TGT 


TGA 


AGC 


TTG 


ATT 


TGA 


AAG 


GGA 


CTA 


GGA 


GAT 


TGT 


TTG 


1305 


ACG 


CTT 


TCT 


TTG 


ATC 


TTG 


ATC 


CTA 


AAT 


ACT 


GGC 


AAG 


GGT 


TCC 


TTT 


1350 


CTT 


CAA 


GAT 


TGT 


CTG 


TCA 


AAG 


AAA 


CTT 


GGT 


TTA 


CTC 


AGC 


TTG 


TGT 


1395 


CTT 


TTC 


GGA 


CAT 


GGC 


TCA 


AAC 


ATG 


ACT 


AGG 


TTG 


GGA 


TAT 


TGT 


TAC 


1440 


AAA 


ATG 


TCC 


TCT 


TCC 


TTT 


GGT 


TAG 


ACT 


GAT 


TGG 


CAA 


TCT 


AGC 


AAT 


1485 


AGA 


GAG 


CCT 


TTG 


AAA 


TGT 


GAA 


AAG 


TTT 


GAA 


TCA 


TTT 


TCT 


TCA 


TTT 


1530 


TAA 


TTT 


CTT 


TGA 


TTA 


TTT 


TCA 


TAT 


TTT 


CTC 


AAT 


TGC 


AGA 


ATG 


AGA 


1575 


TAA 


AAA 


CTA 


CAT 


ACT 


GTC 


GAC 


AAA 


TAA 


ACT 


ACT 


ATT 


GGA 


ANG 


TTA 


1620 


AAA 


TAA 


TGT 


GTG 


TGT 


TGN 


ATG 


TTA 


NGC 


CTA 


ATG 


GAA 


NGG 


ATG 


NGG 


1665 


TTA 


NGC 


AAT 


TTA 


TGA 


ACT 


GNN 


CGC 


TCT 


GTT 


CGC 


TTA 


AAA 


NCC 


TTG 


1710 


GTT 


CCA 


CCT 


TAA 


NGG 


AAN 


GGN 


CCG 


GCC 


ATT 












1740 



(2) INFORMATION FOR SEQ ID NO : 1 7 : 

(i> SEQUENCE CHARACTERISTICS: 

I A) LENGTH: 4 98 

(B) TYPE: amino acid 

(C) STRANDEDNESS: sincle 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Met Glu Ala Leu Leu Lys Pro Phe Pro Ser Leu Leu Leu Ser Ser 

5 10 15 

Pro Thr Pro His Arg Ser He Phe Gin Gin Asn Pro Ser Phe Leu 

20 25 30 

Ser Pro Thr Thr Lys Lys Lys Ser Arg Lys Cys Leu Leu Arg Asn 

35 40 45 

Lys Ser Ser Lys Leu Phe Cys Ser Phe Leu Asp Leu Ala Pro Thr 



O 00/06920 












50 










5 5* 










6 0 




Lv J? 


Pro 


Glu 


Ser 


Leu 


Asp 


Val 


Asn 


He* 


Ser 


Trp 


Val 


Tvsp 


Pro 






Asn 




65 










70 










7 6 


Asn 


Ser 


Arg 


Ala 


Gin 


Phe 


Asp 


Val 


He 


He 


He 


Gly 


Ala 


Gly 










80 










85 










90 


Pro 


Ala 


Gly 


Leu 


Arg 


Leu 


Ala 


Giu 


Gin 


Val 


Ser 


Lys 


Tyr 


Gly 


He 










95 










1 00 










105 


Lys 


Val 


Cys 


Cys 


Val 


Asp 


Pro 


Ser 


Pro 


Leu 


Ser 


Met 


Trp 


Pro 


Asn 










110 










115 










120 


Asn 


Tyr 


Gly 


Val 


Trp 


Val 


Asp 


Glu 


Phe 


Glu 


Asn 


Leu 


Gly 


Leu 


Glu 










125 










130 










135 


Asn 


Cys 


Leu 


Asp 


His 


Lys 


Trp 


Pro 


Met 


Thr 


Cys 


Val 


His 


He 


Asn 










14 0- 










345 










150 


Asp 


Asn 


Lys 


Thr 


Lys 


Tyr 


Leu 


Gly 


Arg 


Pro 


Tyr 


Gly 


Arg 


Val 


Ser 










155 










160 










165 


Arg 


Lys 


Lys 


Leu 


Lys 


-Leu 


Lys 


Leu 


Leu 


Asn 


•Ser 


Cys 


Val 


Glu 


Asn 










170 










175 










180 


Aro 


Val Lys 


^he 


Tyr 


Lys 


Ala 


Lys 


Val 


Trp 


Lys 


Val- 


Glu 


His 


Glu 










185 










190 










195 


Giu 


Phe 


Glu 


.Ser 


Ser 


lie 


Vol 


Cys 


Asp 


Asp 


Gly 


Lys 


Lys 


He 


Arg 










200 










205 










210 


Gly 


Ser 


Leu 


Val 


Val 


Asp 


Ala 


Ser 


Gly 


Phe 


Ala' 


Ser Asp' 


Phe 


He 










215 










220 










22S 


Glu 


Tyr 


Asp 


Arg 


Pro 


Arg 


Asrj- 


Hi s 


Gly 


Tyr 


Gin 


He 


-Ala 


His 


Gly 










230 










235 










24 0 


Vol 


Leu 


Val 


Glu 


Val 


Asp 


Asn 


Hi s 


Pro 


Phe 


Asp 


Leu 


Asp 


Lys 


Met 










245* 










2S0 










255 


Vol 


Leu 


Met 


Asp 


Trp 


Arg 


Asp 


Ser 


Hi s 


Leu 


Gly 


Asn 


Glu 


Pro 


Tyr 










260 










265 










270 


Leu 


Arg 


Val 


Asn 


Asn 


Ala 


Lys 


Glu 


Pro 


Thr 


Phe 


Leu 


Tyr 


Ala 


Met 










2*75' 










280 








265 


Pro 


Phe 


Asp 


, Arg 


Asp 


Leu 


Val 


Phe 


Leu 


Glu 


Glu 


Thr 


Ser 


Leu 


Val 










290 










295 










300 


Ser 


Arg 


Pro 


Val 


Leu 


Ser 


Tyr 


Met 


Glu 


Val 


Lys Arg Arg 


Met 


Val 










305 










310 










315 


Ala 


Arc 


Leu 


Arg 


His 


Leu 


Gly 


He 


Lys 


Val 


Lys 


Ser 


Val 


He 


Glu 










320 










325 










330 


Glu 


Glu 


Lys 


Cys 


Val 


lie 


Pro 


Met 


Gly 


Gly 


Pro 


Leu 


Pro 


Arg 


He 










335 










340 










345 


Pro 


Gin 


Asn 


Val 


Met 


Ala 


lie 


Gly 


Gly 


Asn 


Ser 


Gly 


He 


Val 


His 










350 










355 










360 


Pro 


Ser 


Thr 


Gly 


Tyr 


Met 


val 


Ala 


Arg 


Ser 


Met 


Ala 


Leu 


Ala 


Pro 










365 










370 










375 


val 


Leu 


Ala 


Glu 


Ala 


lie 


Val 


Glu 


Gly 


Leu 


Gly 


Ser 


Thr 


Arg 


Met 










380 










385 










390 


lie 


Arc 


Gly 


$er 


Gin 


Leu 


Tyr 


His 


Arg 


Val 


Trp Asn 


Gly 


Leu 


Trp 










395 










400 










40S 


Pro 


Leu 


Asp 


Arg 


Arg 


Cys 


Val 


Arc 


Giu 


Cys 


Tyr 


Ser 


Phe 


Gly 


Met 










410 










415 










420 


Glu 


Thr 


Leu 


Leu 


Lys 


Leu 


Asp 


Leu 


Lys 


Gly 


Thr 


Arg 


Arg 


Leu 


Phe 










425 










430 










435 


Asp 


Ala 


Phe 


Phe 


Asp 


Leu 


Asp 


Pro 


Lys 


Tyr 


Trp 


Gin 


Gly 


Phe 


Leu 










440 










445 










450 


Ser 


Ser 


Arg 


Leu 


Ser 


Val 


Lys 


Glu 


Leu 


Gly 


Leu 


Leu 


Ser 


Leu 


Cys 










455 










460 










465 


Leu 


Phe 


Gly 


His 


Gly 


Ser 


Asn 


Met 


Thr 


Arg 


Leu 


Asp 


He 


Val 


Thr 










470 










475 










480 


Lys 


Cys 


Pro 


Leu 


Pro 


Leu 


Val 


Arg 


Leu 


He 


Gly Asn 


Leu 


Ala 


He 










485 










490 










495 



Glu Ser Leu 
498 

(2) INFORMATION FOR SEO ID NO : 1 8 : 

<i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 4 98 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 6 : 

Met Glu Ala Leu Leu Lys Pro Phe Pro Ser Leu Leu Leu Ser Ser 
5 10 15 

Pro Thr Pro His Arg Ser He Phe Gin Gin Asn Pro Ser Phe Leu 



wo 01 i : 




PCT/tlS99/1S327 



2C 25 30 



Ser 


Pro 


Thr 


Thr 


LVE 


LVF 


Lys 


Ser 


Arg 


Lys 


Cys 


Leu 


Leu 


Arg 


Asn 




















40 










45 


Lys 


Ser 


Ser 


Lys 


Leu 


Phe 


Cys 


Ser 


Phe 


Leu 


Asp 


Leu 


Ala 


Pro 


Thr 










5C 










55 










60 


Ser 


Lys 


Pro 


Glu 


Ser 


Leu 


Asp 


Val 


Asn 


He 


Ser 


Trp 


Val 


Asp 


Pro 










6S 










70 










76 


Asn 


Ser 


Asn 


Arg 


Ala 


Gin 


Phe 


Asp 


Val 


He 


He 


He 


Gly 


Ala 


Gly 










80 










85 










90 


Pro 


Ala 


Gly 


Leu 


Arc 


Leu 


Al a 


Glu 


Gin 


Val 


Ser 


Lys 


Tyr 


Gly 


He 










si 










100 










105 


Lys 


Val 


Cys 


Cys 


Val 


Asp 


Pro 


Ser 


Pro 


Leu 


Ser 


Met 


Trp 


Pro 


Asn 










110 










115 










120 


Asn 


Tyr 


Gly 


Val 


Trp 


Val 


Asp 


Glu 


Phe 


Glu 


Asn 


Leu 


Gly 


Leu 


Glu 










i2b 










130 










135 


Asn 


Cys 


Leu 


Asp 


Kis 


Lys 


Trp 


Pro 


Met 


Thr 


Cys 


Val 


Hie 


He 


Asn 










140 










145 










150 


Asp 


Asn 


Lys 


Thr 


Lys 


Tyr 


Leu 


Gly 


Arg 


Pro 


Tyr 


Gly 


Arc 


Val 


Ser 










155 










160 










165 


Arg 


Lys 


Lys 


Leu 


LVE 


Leu 


Lys 


Leu 


Leu 


Asn 


Ser 


Cys 


Val 


Glu 


Asn 










170 










175 










180 


Arg 


Val 


Lys 


Phe 


Ty2- 


Lys 


Ala 


Lys 


Val 


Trp 


Lys 


Val 


Glu 


His 


Glu 










185 










190 










195 


Glu 


Phe 


Glu 


Ser 


Ser 


lie 


Val 


Cys 


Asp 


Asp 


Gly 


Lys 


Lys 


He 


Arg 










200 










205 










210 


Gly 


Ser 


Leu 


Val 


Val 


Asp 


Ai a 


Ser 


Gly 


Phe 


Ala 


Ser 


Asp 


Phe 


He 










2: s 










220 










225 


g:u 


Tyr 


Asp 


Arg 


Pro 


Arc 


Asn 


His 


Gly 


Tyr 


Gin 


He 


Ai a 


His 


Gly 










230 










235 










240 


Val 


Leu 


Val 


Glu 


Val 


Asp 


Asn 


His 


Pro 


Phe 


Asp 


Leu 


Asp 


Lys 


Met 










24 S 










250 










2 5 5' 


Val 


Leu* 


Met 


Asp 


Trp 


Arc 


Asp 


Ser 


His 


Leu 


Gly 


Asn 


Glu 


Pro 


Tyr 










260 










265 










270 


Leu 


Arg 


val 


Asn 


Asn 


Ai a 


Lys 


Glu 


Pro 


Thr 


Phe 


Leu 


Tyr 


Ala 


Met 










275 










280 










285 


Pro 


Phe 


Asp 


Arg 


Asp 


Leu 


Val 


Phe 


Leu 


Glu 


Glu 


Thr 


Ser 


Leu 


Val 










290 










295 










300 


Ser 


Arg 


Pro 


Val 


Leu 


Ser 


Tyr 


Met" 


Glu 


Val 


Lys 


Arg 


Arc 


Met 


Val 










305 










310 










315 


Ala 


Arg 


Leu 


Arg 


His 


Leu 


Gly 


lie 


Lys 


Val 


Lys 


Ser 


Val 


He 


Glu 










320 










325 










330 


Glu 


Glu 


Lys 


Cys 


val 


lie 


Pro 


Met 


Gly 


Gly 


Pro 


Leu 


Pro 


Arg 


He 










335 










340 










345 


Pro 


Gin 


Asn 


Val 


Met 


Ala 


lie 


Gly 


Gly 


Asn 


Ser 


Gly 


He 


Val 


His 










35C 










355 










360 


Pro 


Ser 


Thr 


Gly 


Tyr 


Met 


val 


Ala 


Arg 


Ser 


Met 


Ala 


Leu 


Ala 


Pro 










365 










370 










375 


Val 


Leu 


Ala 


Glu 


Ala 


lie 


val 


Glu 


Gly 


Leu 


Gly 


Ser 


Thr 


Arg 


Met 










380 










385 










390 


lie 


Arg 


Gly 


Ser 


Gin 


Leu 


Tyr 


His 


Arg 


Val 


Trp 


Asn 


Gly 


Leu 


Trp 










395 










400 










4 05 


Pro 


Leu 


Asp 


Arc 


Arc 


Cys 


Val 


Arg 


Glu 


Cys 


Tyr 


Ser 


Phe 


Gly 


Met 










410 










415 










420 


Glu 


Thr 


Leu 


Leu 


Lys 


Leu 


Asp 


Leu 


Lys 


Gly 


Thr 


Arg 


Arg 


Leu 


Phe 










425 










430 










435' 


Asp 


Ala 


Phe 


Phe 


Asp 


Leu 


Asp 


Pro 


Lys 


Tyr 


Trp 


Gin 


Gly 


Phe 


Leu 










440 










445 










450 


Ser 


Ser 


Arg 


Leu 


Ser 


Val 


Lys 


Glu 


Leu 


Gly 


Leu 


Leu 


Ser 


Leu 


Cys 










455 










460 










465 


Leu 


Phe 


Gly 


His 


Gly 


Ser 


Asn 


Met 


Thr 


Arg 


Leu 


Asp 


He 


Val 


Thr 










470 










475 










480 


Lys 


Cys 


Pro 


Leu 


Pro 


Leu 


val 


Arg 


Leu 


He 


Gly 


Asn 


Leu 


Ala 


He 



485 490 495 

Glu Ser Leu 
498 

(2) INFORMATION FOR SEO ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 498 

(B) TYPE: amino acid 

(C) STRAJ^DEDNESS : double 

(D) TOPOLOGY: linear 



wo 00/08920 
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/Xi) SEOUENCE DESCRIPTION: SEQ 3D NO: 19: 



Met 


Glii 


Thr 


Leu 


Leu 
t 


Lvs 


Pro 


Phe 


Pro 


Ser 
10 


Leu 


Leu 


Leu 


Ser 


Ser 
15 


Pro 


Thr 


Pro. 


Tvr 


Arc 
20 


Ser 


lie 


Val 


Gin 


Gin 
25 


Asn 


Pro 


Ser 


Phe 


Leu 
30 


Ser 


Pro 


Thr 


Thr 


Gin 
3 b 


Lvs 


Lvs 


Ser 


Arg 


Lys 
40 


Cys 


Leu 


Leu 


Arc 


Asn 
45 


Ly s 


Ser 


Ser 


Lvs 


Leu 
50 


Phe 


Cys 


Ser 


Phe 


Leu 
55 


ASD 


Leu 


Ala 


Pro 


Thr 
60 


Ser 


Lys 


Pro 


Glu 


Ser 
65 


Leu 


Asn 


Val 


Asn 


lie 
70 


Ser 


Trp 


val 


Asp 


Pro 
76 


Asn 


Ser 


Glv 


Arg 


Ala 


Gin 


Phe 


Asp 


Val 


lie 


lie 


lie 


Gly Ala 


Gly 










80 










85 










90 


Pro 


Ala 


Glv 


Leu 


Arc 


Leu 


Ala 


Glu 


Gin 


Val 


Ser 


Lys 


Tyr Gly 


lie 










95 










ipo 










105 


IjVS 
ijy& 


Val 


Cvs 

>_ y S3 


Cvs 


Val 
110 


Asp 


Pro 


Ser 


Pro 


Leu 
115 


Ser 


Met 


Trp 


Fro 


Asn 
120 


Asn ' 


> •* 


Gl V 


Vol 


Tm 
125 


Val 


Asp 


Glu 


Phe 


Glu 
130 


Asn 


Leu 


Gly 


Leu 


Glu 
135 


Asp 




Leu 


Asp 


Hi £ 
140 


Lvs 




Pro 


Met 


Thr 

145 


Cv s 


Val 


His 


lie 


Asn 
150 






Ly s 


Thr 


155 


Tvr 


Leu 


Glv 


Ara 


Pro 
160 


Tvr 


Glv 


Arg 


Val 


Ser 
165 




Ly s 




Leu 


170 


Leu 


Lvs 


Leu 


Leu ' 


Asn 
175 


Ser 


Cvs 
v-y o 


Va 1 




Asn 
180 


Arg 


Val 


Ly s 


Phe 


Tvr 
185 


Lvs 


Ala 


Lys 


Val 


Tro 
190 


Lys 


Val 


(flu 


His 


Glu 
195 


Gl u 


Phe 


Glu 


Ser 


Ser 
200 


lie 


Val 


Cvs 


Asp 


Asp 
205 


Glv 


LVE 


Lys 


lie 


Arc 
210 


Gl y 


Ser 




Val 


Val 
215 




Ai a 


Ser 


Glv 


Phe 
220 


Ala 


Ser 


Asp 


Phe 


He 
225 


Glu 


Tyr 




Lvs 


230 




Asn 


Hi s 


Glv 


i yi. 
235 


Gin 


lie 


Ala 


His 


Glv 
240 


Val 




Val 


Glu 


Val 
245 


Asp 


Asn 


His 


Pro 


Phe 
250 


Asp 


Leu 


Asp 


Lys 


Met 

255 


Va 1 




Met 


Asp 


Trp 
26 0 


Arg 


Asp 


Ser 


His 


265 


Gly 




Glu 


Pro 


1 yi 

2 70 


Leu 


Arg 


Val 


Asn 


2 7 5 


Ai a 


Lvs 


Glu 


Pro 


Thr 
280 


Phe 


Leu 


Tyr 


Aia 


Met 

2 8 5- 


Pro 


Phe 


Asp 


Arg 


Asn 
290 


Leu 


Va i 


Phe 


Leu 


Glu 

295 


Glu 


Thr 


Ser 


Leu 


Val 
300 


Ser 


Arg 




Val 


305 


Ser 




Met 


Glu 


Val 
310 


Lys 


Ara 
y 


Arg 


Met 


Val 
315 


Ala 


Arg 






His 
320 


Leu 




lie 


Lys 


Val 
325 




Ser 


Val 


He 


Glu 
330 


Glu"Glu 


Ly s 


Cys 


Val 


lie 




Met 


Rl V 


Glv 






Pro 


Arg 


He 










335 










340 










345 


Pro Gin 


Asn 


Val 


Met 


Ala 


lie 


Gly 


Glv 


Asn 


Ser 


Glv 
w A y 


lie 


Val 


His 










3 50 










355 










360 


Pro 


Ser 


Thr 


Gl y 


Tyr 
365 


Met 


Val 


Ala 


Arg 


Ser 
370 


Met 


Ala 


Leu 


Ala 


Pro 

375 


Val 


Leu 


Ala 


Glu 


Ala 
380 


lie 


Val 


Glu 


Gly 


Leu 

385 


Gl y 


Ser 


Thr 


Arg 


Met 

390 


lie 


Arg 


Gly 




Gin 
395 






His 


Arg 


Val 
400 




Asn 


Gly 


Leu 


Trp 
4 05 


Pre. 


Leu 




Arg 


Arg 
410 


Cys 


Val 


rti y 


Glu 


Cys 
415 




Ser 


Phe 


Gly 


Met 
420 


Glu 


Thr 










Asp 


Leu 


Lys 


Gly 


Thr 


Arc 
«i y 


Arg 


Leu 


Phe 










425 








430 








435 


Asp 


Ala 


Phe 


Phe 


Asp 
440 


Leu 


Asp 


Pro 


Lys 


Tyr 
445 


Trp 


Gin 


Gly 


Phe 


Leu 

450 


Ser 


Ser 


Arg 


Leu 


Ser 
455 


Val 


Lys 


Glu 


Leu 


Gly 
460 


Leu 


Leu 


Ser 


Leu 


Cys 
465 


Leu 


Phe 


Gly 


His 


Gly 
470 


Ser 


Asn 


Leu 


Thr 


Arg 
475 


Leu 


Asp 


lie 


val 


Thr 
480 


Lys 


Cys 


Pro 


Val 


Pro 
485 


Leu 


Val 


Arg 


Leu 


lie 
90 


Gly 


Asn 


Leu 


Ala 


Val 
495 



Glu Ser Leu 
498 



(2) 



INFORMATION FOR SEQ ID NO: 20: 
<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: ■ 56 
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(B; TYPE: 'atn'ino airic 

to STRANDEDNESS : single 

{D) TOPOLOGY: linear 







(xi) 




SEQUENCE 


DESCRIPTION : 


SEO 


NO: 


20 : 






Met 


Glu 


Ala 


Leu 


Leu 

5 


Lys 


Pro 


Phe Pro 


Ser 
10 


Leu 


Leu 


Leu 


Ser 


Ser 
15 


Pro 


Thr 


Pro 


Tyr 


Arg 
20 


Ser 


lie 


Val Gin 


Gin 
25 


Asn 


Pro 


Ser 


Phe 


Leu 
30 


Ser 


Pre 


Thr 


Thr 


Lys 
25 


Lys 


Asn 


Gin Glu 


Asn 
40 


Val 


Phe 


Leu 


Glu 


Thr 
45 


Lys 


Val 


val 


Asn 


Phe 
50 


Phe 


Val 


Ala Phe 


Leu 
55 


lie 











(2) INFORMATION FOR SEQ 3D N0:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 nuci e i c • acids 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TO*>OLOGY: linear' 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

TGACTTCACC CTTCTTTCTT GTCTTC 26 

(2) INFORMATION kOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 nucleic dcids 

IB')* ' TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

AGAGTCTGGG TTC 13 

(2) INFORMATION FOR SEQ ID NO : 2 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 nucleic acids 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
.,(D) TOPOLOGY : 1 inear 

(xi) ' SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CTAGTATCG S 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 nucleic acids 

(B; TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

CTAAATAT 8 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 nucleic acids 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: coubi e 
(DJ TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

AATTTTCAAA 10 
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